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Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Foreword 


This fifth volume on Advances and Applications of DSmT for Information Fusion collects theoretical 
and applied contributions of researchers working in different fields of applications and in mathematics, 
and is available in open-access. The collected contributions of this volume have either been published 
or presented after disseminating the fourth volume in 2015 (available at fs.unm.edu/DSmT-book4.pdf 
or www.onera.fr/sites/default/files/297/2015-DSmT-Book4.pdf) in international conferences, seminars, 
workshops and journals, or they are new. The contributions of each part of this volume are chronologically 
ordered. 

First Part of this book presents some theoretical advances on DSmT, dealing mainly with modified 
Proportional Conflict Redistribution Rules (PCR) of combination with degree of intersection, coarsening 
techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set 
classifiers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria 
analysis with PCR, and improved PCRS and PCR6 rules preserving the (quasi-)neutrality of 
(quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes. 

Because more applications of DSmT have emerged in the past years since the apparition of the fourth 
book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly 
in building change detection, object recognition, quality of data association in tracking, perception in 
robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image 
fusion, coarsening techniques, recommender system, levee characterization and assessment, human 
heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria 
analysis, group decision, human activity recognition, storm prediction, data association for autonomous 
vehicles, identification of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST 
code library for information fusion including PCR rules, and network for ship classification. 

Finally, the third part presents interesting contributions related to belief functions in general published 
or presented along the years since 2015. These contributions are related with decision-making under 
uncertainty, belief approximations, probability transformations, new distances between belief functions, 
non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes 
theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence 
numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, 
imbalanced data classification, and hybrid techniques mixing deep learning with belief functions as well. 

We want to thank all the contributors of this fifth volume for their research works and their interests 
in the development of DSmT, and the belief functions. We are grateful as well to other colleagues for 
encouraging us to edit this fifth volume, and for sharing with us several ideas and for their questions 
and comments on DSmT through the years. We thank the International Society of Information Fusion 
(www.isif.org) for diffusing main research works related to information fusion (including DSmT) in the 
international fusion conferences series over the years. 

Florentin Smarandache is grateful to The University of New Mexico, U.S.A., that many times partially 
sponsored him to attend international conferences, workshops and seminars on Information Fusion. 

Jean Dezert is grateful to the Department of Information Processing and Systems (DTIS) of the 
French Aerospace Lab (Office National d’Etudes et de Recherches Aérospatiales), Palaiseau, France, 
for encouraging him to carry on this research and for its financial support. 

Albena Tchamova is first of all grateful to Dr. Jean Dezert for the opportunity to be involved during 
more than 20 years to follow and share his smart and beautiful visions and ideas in the development 
of the powerful Dezert-Smarandache Theory for data fusion. She is also grateful to the Institute of 
Information and Communication Technologies, Bulgarian Academy of Sciences, for sponsoring her to 
attend international conferences on Information Fusion. 


The Editors: 


Prof. Florentin Smarandache Dr. Jean Dezert Dr. Albena Tchamova 

Tucson, USA. Orléans, France. Sofia, Bulgaria. 
http://fs.unm.edu/DSmT.htm https://www.onera.fr/fr/staff/jean-dezert —_https://sdp.iict.bas.bg/staff/albenaEN. html 
http://fs.unm.edu/FS.htm Email: jean.dezert@onera.fr Email: albena.tchamova@iict.bas.bg 
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Part 1: 
Theoretical advances 
on DSmT 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Modified PCR Rules of Combination 
with Degrees of Intersections 


Florentin Smarandache’, Jean Dezert? 


“Department of Mathematics, University of New Mexico, Gallup, NM, USA. 
The French Aerospace Lab, ONERA/DTIS, Palaiseau, France. 


Emails: smarand@unm.edu, jean.dezert@onera.fr 


Originally published as: F. Smarandache, J.Dezert, Modified PCR Rules of Combination with Degrees 
of Intersections, in Proc. of the 18th Int. Conf. on Information Fusion (Fusion 2015), Washington 
D.C, USA, July 6-9, 2015, and reprinted with permission. 


Abstract—In this paper, we propose a modification of PCR5 
and PCR6 fusion rules with degrees of intersections for taking 
into account the cardinality of focal elements of each source 
of evidence to combine. We show in very simple examples the 
interest of these new fusion rules w.r.t. classical Dempster-Shafer, 
PCR6, Zhang’s and Jaccard’s Center rules of combination. 


Keywords: Information fusion, belief functions, DSmT, 
PCR6, degrees of intersection. 


I. INTRODUCTION 


In this paper, we propose modifications of the Proportional 
Conflict Redistribution rule no. 6 (PCR6) [1] (Vol. 3) for 
the combination of basic belief assignments (BBA’s) which 
integrate the degrees of intersections of focal elements of 
each source of evidence to combine. Because we consider two 
possible definitions of degrees of intersections (i.e. Zhang’s 
and Jaccard’s degrees) and also two normalization methods 
(simplest and sophisticate), we propose four modified versions 
of PCR6 rules!. After a brief presentation of classical rules 
of combination and a detailed presentation of our modified 
PCR6 rules, we evaluate and compare their behaviors in 
different emblematic examples to guide the choice of the most 
interesting one. 


IJ. BELIEF FUNCTIONS AND CLASSICAL FUSION RULES 


Belief functions have been introduced by Shafer in 1976 
from Dempster’s works [2] in Dempster-Shafer’s theory (DST) 
of evidence. DST is mainly characterized by a frame of 
discernment (FoD), sources of evidence represented by basic 
belief assignment (BBA), belief (Bel) and plausibility (Pl) 
functions, and Dempster’s rule of combination, denoted as DS 
rule in the sequel? of combination. DST has been modified 
and extended into Dezert-Smarandache theory [1] (DSmT) to 
work with quantitative or qualitative BBA and to combine the 
sources of evidence in a more efficient way thanks to new 
proportional conflict redistribution (PCR) fusion rules — see 
[3]-[6] for discussion and examples. 


'The methodology proposed in this paper is general and can also be applied 
to modify similarly other PCR rules. Since we consider PCR6 rule the most 
efficient one [6], we focus our presentation on PCR6 only 

2DS acronym standing for Dempster-Shafer since Dempster’s rule has been 
widely promoted by Shafer in the development of his mathematical theory of 
evidence. 
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More precisely, let’s consider a finite discrete FoD 0 = 
{61,02,...,8n}, with n > 1, of the fusion problem under 
consideration and its fusion space G° which can be chosen 
either as the power-set 2°, the hyper-power set? D®, or the 
super-power set S° depending on the model that fits with 
the problem [1]. A BBA associated with a given source of 
evidence is defined as the mapping m/(.) : G? — [0,1] 
satisfying m(0) = 0 and S)4-¢e m(A) = 1. The quantity 
m(A) is called mass of belief of A committed by the source 
of evidence. Belief and plausibility functions are defined by 


Bel(A)= S> m(B), and PIA)= S> m(B). 
BCA BNA#O 
BeG® BeG® 


If for some A € G®, m(A) > 0 then A is called a focal ele- 
ment of the BBA m/(.). When all focal elements are singletons 
and G® = 2° then the BBA m(.) is called a Bayesian BBA [2] 
and its corresponding belief function Bel(.) is homogeneous 
to a (possibly subjective) probability measure, and one has 
Bel(A) = P(A) = PI(A), otherwise in general one has 
Bel(A) < P(A) < PI(A), VA € G®. The vacuous BBA , or 
VBBA for short, representing a totally ignorant source is 
defined as m,,(I;) = 1, where the total ignorance defined as 
I, £0, U02U...U9,, if the FoD is O = {61,00,..., On}. 
Since in Shafer’s book [2], the total ignorance [; is also 
denoted ©, we will adopt this notation in the sequel. 

Many rules have been proposed in the literature over the 
decades (see [1], Vol. 2 for a detailed list of fusion rules) to 
combine several distinct sources of evidence represented by 
the BBA’s ™ (.), mo(.), ..., ™ms(.) (s > 2) defined on same 
fusion space G®. In DST, the combination of s > 2 BBA’s 
is traditionally accomplished with Dempster-Shafer (DS) rule 
[2] defined by m?* ,(0) = 0 and for all X 4 0) in 2° 


jee 


where the numerator of (2) is the mass of belief on the con- 
junctive consensus on X. The denominator 1—m,__,(@) is a 


3 which corresponds to a Dedekind’s lattice, see [1] Vol. 1. 
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normalization constant. The total degree of conflict my1,...,.s(0) 
between the s sources of evidences is defined by 


(3) 


Perry 


DS rule is associative and commutative and preserves the 
neutrality of the VBBA. s sources of evidence are said in 
total conflict if m,,_,(0) = 1. In this case the combination 
of the sources by DS rule cannot be done because of the 
mathematical 0/0 indeterminacy in (2). In DS rule, ™m,.._.(0) 
is redistributed to all focal elements of the conjunctive operator 
only proportionally to their mass (i.e. without taking care of 
their cardinalities). So with DS rule and with combination 
of 2 BBA’s, the product m,(X1)m2(X2) is transferred to 
X,M X2 = X only, no matter how the ratio between the 
cardinality of X and X, U X2 varies. This DS principle of 
redistribution has been questioned by Zhang in [7] and Fixsen 
and Malher in [8] because it does not discriminate the case 
where X,UX2 is large but X;_X2 is small with respect to the 
case where X;U X9 is small but X1M Xo is large. To palliate 
this problem, Zhang proposed in 1994 a modified version of 
DS rule [7] including a measure of degree of intersection of 
focal elements. The general formula of this modified DS rule 
is defined by m?_ ,(0) = 0 and for all X 4 0 in 2° 


1 
mi,....(X) = =e Ss 
i. 


Kip. Xee2? 
X1N..AXs=X 


D(Xi,...,Xs) [] mi(Xi), 
i=l 


(4) 
where D(X ,...,X,) denotes a measure of the degree of 
intersection between the focal elements X,, Xo, ...X,, and 


is a normalization constant allowing to get 
Sxe20 MP. ,(X) = 1. Because the measure of degree of 
intersection D(X,,..., Xs) can be defined in different ways, 
this yields to different versions of the modified DS rule above. 
In [7], Zhang suggested to define D(X),..., Xs) as 


a |Xi1NX2N...0Xs| 
[Xa] -|Xal-...- [Xs]? 


where |X, 7 X2M...7 X5| is the cardinality of the in- 
tersection of the focal elements Xj, Xo,..., X;,, and |X|, 
|X|, ...|X.| their cardinalities. Replacing D(X1,..., Xs) by 
D7(X1,...,Xs) in the formula (4) defines Zhang’s Center 


D*(X1,...,Xs) (5) 


geeey 


sequel. The normalization constant of ZCR is denoted K7C%. 


If we use Jaccard’s index as measure of the degree of 
intersection [9] which is defined by 


a |X1NX2N...9Xs| 


DY Migs: ep ee AE ee ee 
ne Eau cme AL 


, Xs) (6) 
then we obtain Jaccard’s center rule (JCR) of combination, 
and we denote it m/©*.(.), in replacing D(X1,..., Xs) by 
D/7(X1,...,Xs) in the formula (4). The normalization con- 


stant of JCR is denoted K/°%. ZCR and JCR rules are partic- 


ular instances of Modified DS rule (MDS) proposed by Fixsen 
and Mahler in [8]. ZCR and JCR are commutative but not 


idempotent. It can be proved that Zhang’s degree is associative 
that is D4(X1, Xo, Sank Xs) = D?(X, D4 (Xa, aA ,Xs)), 
whereas Jaccard’s degree is not associative. If one combines 
three (or more) BBA’s and there is no conflicting mass, then 
ZCR is associative, whereas JCR is not associative. If there is 
conflicting masses, then ZCR is still associative, but JCR is 
not associative. Zhang’s and Jaccard’s degrees pose a problem 
because ZCR and JCR become strictly equivalent with DS 
rule when the cardinality is | for all relevant sets, or when 
|Xy N Xen... A Xs| = |X1| . |X| OS deat |X| in the 
circumstance of conflicting evidence. Therefore, it inherits the 
same limitations as DS rule — see example 2 in Section V. 

The doubts of the validity of DS rule has been discussed 
by Zadeh in 1979 [10]-[12] based on a very simple example 
with two highly conflicting sources of evidences. Since 1980’s, 
many criticisms have been done about the behavior and the 
justification of such DS rule. More recently, Dezert et al. in 
[3], [4], [18] have put in light other counter-intuitive behaviors 
of DS rule even in low conflicting cases and showed serious 
flaws in logical foundations of DST [5]. To overcome the 
limitations and problems of DS rule of combination, a new 
family of PCR rules have been developed in DSmT framework 
[1]. In PCR rules, we transfer the conflicting mass only to the 
elements involved in the conflict and proportionally to their 
individual masses, so that the specificity of the information is 
entirely preserved. The general principle of PCR consists: 1) to 
apply the conjunctive rule, 2) to calculate the total or partial 
conflicting masses; 3) then redistribute the (total or partial) 
conflicting mass proportionally on non-empty sets according 
to the integrity constraints one has for the frame 0. Because 
the proportional transfer can be done in different ways, there 
exist several versions of PCR rules of combination. PCR6 
fusion rule has been proposed by Martin and Osswald in [1] 
Vol. 2, Chap. 2, as a serious alternative to PCRS fusion rule 
proposed originally by Smarandache and Dezert in [1] Vol. 
2, Chap. 1. When only two BBA’s are combined, PCR6 and 
PCRS fusion rules coincide, but they differ in general as soon 
as more than two sources have to be combined altogether. 
Recently, it has been proved in [6] that only PCR6 rule is 
consistent with the averaging fusion rule which allows to 
estimate the empirical (frequentist) probabilities involved in 
a discrete random experiment, and that is why we recommend 
to use it in applications when possible. For Shafer’s model of 
FoD*, the PCR6> combination of two BBA’s m1(.) and ma(.) 
is defined by mi “°/°(0) = 0 and for all X 4 0 in 2° 


mya °(X)= YP _m(Xi)ma(X2) 


X1,X2E€2° 
X{NXg=X 


my1(X)?m2(Y) m 


2(X)?m1(Y) 
7 seers , 
vedo mae yea)” Ina yor ake) 
XNY=0 


4that is when GO = 2°, 
exclusive. 

Swhich turns to be equal to PCRS formula in case of fusion of two BBA’s 
only. 


and assuming all elements exhaustive and 
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where all denominators in (7) are different from zero. If a 
denominator is zero, that fraction is discarded. All proposi- 
tions/sets are in a canonical form [1]. Basic MatLab codes of 
PCR rules can be found in [1], [13] or from the toolboxes 
repository on the web [14]. The general and concise formula 
of PCR6 rule for combining s > 2 sources of evidences is 


PCR6 
FTO 2 oes 58 


(X) = ma,2,...,e(X) + CORPO (X) (8) 


i s(X) corresponds to the conjunctive consensus 
on X between s sources of evidence, which is defined by 


oe [[ mi), 


X1,..X,E2° t=1 
X{N..AXs=X 


A 


m1,2,...,8(X) (9) 


and where CR?C6(X) is the part of the conflicting masses 
redistributed back to the focal element X according to PCR6 
redistribution principle, that is 


2 = 


ip Xig rs Xi, EGO\X (A124 eR IEP {1 
(Aja Xig )OX=0 
[rig (X) + mig(X) +... + ma, (X)]- 
. Mi (X) +e Min, (X)mina, (Xings) +++ Mis (Xi ) 
Miz (Xx) tees FM, (Xx) + Ming (Xixg i) tee Miz aon 


s—1 


S 


k=1x 


A 


CRPCR6 (X) 


jetey 


In Eq.(10), P*({1,...,5}) is the set of all permutations of 

the elements {1,2,...,5}. It should be observed that X;,, 
Xj.,...,4;, may be different from each other, or some of 
them equal and others different, etc. As discussed and justified 
in [6], we focus here and in the sequel on PCR6 rule of 
combination rather than PCR5, but the general formula of 
PCRS rule can be found in [1], [6] with examples, and a 
concise PCR5 general formula similar to (11) is possible. Like 
the averaging fusion rule, the PCR5 and PCR6 fusion rules are 
commutative but not associative. 


III. PCR6 RULE WITH DEGREES OF INTERSECTION 


As presented in the previous section, the original versions 
of PCR5 or PCR6 rules of combination (as well as original 
DS rule) use only part of the whole information available 
(i.e. the values of the masses of belief only), because they 
do not exploit the cardinalities of focal elements entering in 
the fusion process. Because the cardinalities of focal elements 
are fully taken into account in the computation of the measure 
of degree of intersection between sets, we propose to improve 
PCR rules using this measure. The basic idea is to replace 
any conjunctive product by its discounted version thanks to 
the measure of degree of intersection D when the intersection 
of focal elements is not empty. The product of partial (or total) 
conflicting masses are not discounted by the measure of degree 
of intersection because the degree of intersection between two 
(or more) conflicting focal elements always equals zero, that 
is if X NY = 0, then D(X,Y) = 0. Because there are 
different ways to define degrees of intersection between set 
(here we consider only Zhang’s and Jaccard’ degrees), and 
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there are different ways to make the normalization because 
of the weighted conjunctive product involved in formulas, 
we come up with several versions of modified PCR6 rule of 
combination. We consider in fact two main modified versions 
of PCR6. The first modified version uses a classical normal- 
ization step based on the division by a normalization factor. 
The second modified version uses a sophisticate normalization 
step as shown through the general modified PCR6 formulas. 


A. Simplest modified PCR6 rule 


The simplest modified PCR6 rule including the measure of 
degree of intersection between sets is defined for s > 2 BBA 


by mp5 CFO) = 0 and for any non empty X € 2°, by 
1 


-[MPo,...,0(X) + CR?O**(X)], 1D) 


where Ky is a normalization constant allowing to get 
Vxege mS Rs (X) = 1; CRPCRS(X) is the part of the 
conflicting masses redistributed back to the focal element X 
according to PCR6 redistribution principle and defined by 
(10); and mP»_5(X) is the discounted conjunctive consensus 
by the measure of the degree of intersection, defined by 


MigacsX)= Y D(Ki,.-.,Xs)[[m(%). 2) 
X1,...X,€2° i=1 
X{N..AXs=X 


A similar general formula holds for the modified PCRS rule 
with degrees of intersection between focal elements. For the 
fusion of two BBA’s m4(.) and m2(.), the modified PCR6 and 
PCRS formulas coincide and reduce to the formula below 


iL 
DPCR5/6— 
Ky 3 


DPCR5/6 
1,2 


(a) = 


[ .> D(X1, X2)mi(X1)me2(X2) 


ma(X)?’ma(¥)_ | _ma(X)?mi(¥) 
+ Gorm) mootmanl (13) 
xny=0 


Depending on the degree of intersection we take (either D7 

or D! ), we get two versions of this modified PCR6 rule. 
The result of the fusion for each version will be denoted 
me5CRS(.) and m{5C"8 (.) in the sequel. ZPCR6 and JPCR6 
rules® are commutative but not associative. 


B. Sophisticate modified PCR6 rule 


We propose here a more sophisticate modified PCR6 rule 
which does not use the normalization by the division with 
a normalization constant but which makes a proportional 
redistribution of the non conflicting mass missing from the 
discounted conjunctive rule (after including a degree of inter- 
section). Before providing the general formula of this sophis- 
ticate modified PCR6 rule, let’s explain how the redistribution 
that we propose is done in the two BBA’s case at first for 
simplicity. 


6ZPCR6 and JPCR6 denote the PCR6 rules modified with Zhang’s and 
Jaccard’s degrees of intersection respectively. 
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Let’s suppose to have only two BBA’s myj(.) and 
M2(.) defined on the same FoD © (assuming Shafer’s 
model for simplicity). When X, 1 X2 = X, then (1 — 
D(X, X2))mi(X1)m2(X2) will be transferred back to X1 
and X» proportionally with respect to their masses (following 
PCRS5/6 principle), that is: 


a _ B _ (1 — D(X1, X2))mi(X1)me2(X2) 
mi(X1) m2(X2) mi(X1) “i m2(X2) : 
whence, 2(X,)ma(Xs) 
= (1— D(X1, X2))- rea Xa) ere) ae maa) 
_ mi(X1)m3(X2) 
B=(1-— D(X, X2))- real Xa) erat) zs axa): 


The formula of this sophisticate modified combination rule, 


denoted’ SDPCRS/6, is given by m} 5 °"°/°(9) = 0 and by 


mea 8X) & SS D(X1, X2)mi(X1)ma(X2) 


_rm(XPima(¥)__ma(X)?rm(¥) 
+ Gmon mi) ma(X) + ml¥)! 
XnNY=6 
7 mi(X)?m2(Y) 
\ vent oe i) + m2(Y) 
XNY AO 
_malX?ma(¥) 
m2(X) + mi(Y) , 


(14) 


The third sum of Eq.(14) represents the non-conflicting mass 
missing from the conjunctive rule including a degree of inter- 
section. As for ZPCR6 or JPCR6 rules, we can choose Zhang’s 
or Jaccard’s degrees (or any other measures of degree of 
intersection if preferred). The generalization of this principle 
of redistribution of missing discounting conjunctive masses 
yields the following general sophisticate modified PCR6 rule 
of combination. 

D 


Siang 


(X) PORTO (xX) 4 RP (x), 
(15) 
where MR?PCR°(X) is the part of the missing conjunctive 
masses due to discounting back to the focal element involved 
in the conjunction which is redistributed according to PCR6 
redistribution principle. MR?°”®(X) is defined by 


s—1 
MR = S> 
BL X 4) Xig yrXig CE2O\X (br st29-- te EPIC). 8} 
(Nay Xi, )9X #0 
k 
(DD Gare Ke yO) 
j=l 
: Miz (X) 1 Min, (X) mings (Xin 41) +++ Mis (Xi,) 
Miz (Xx) tees ME, (Xx) aa Ming (Xings) +... Mig (Xi, Nie 


SZPCR6 and SJPCR6 rules® are commutative but not associa- 
tive. 


7§ letter in this acronym stands for Sophisticate. 
8SZPCR6 and SJPCR6 denote the PCR6 rules modified with Zhang’s and 
Jaccard’s degrees of intersection respectively. 
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IV. ANALYSIS OF THE NEUTRALITY OF VBBA 


When there is no conflict between BBA’s, DS, PCR5 or 
PCR6 rules reduce to the conjunctive rule which preserves the 
neutrality of VBA. When there is conflict between BBA’s only 
DS preserves neutrality of VBA because DS is associative. In 
general, PCR5 and PCR6 do not preserve the neutrality of 
the VBA if more than two conflicting BBA’s (including the 
VBA) are combined altogether’. In general, the VBA m,(.) 
is not a neutral element for the conjunctive rule of combination 
discounted with Jaccard’s degree of intersection when combin- 
ing two (or more) BBA’s as shown in the following counter- 
example. If we take 0 = {A, B}, with AN B = 0, and m(.) 
defined as m,(A) = 0.5, m1(B) = 0.3 and m,(AUB) = 0.2. 
Then the result of the JCR fusion is m/°?(A) = 0.4167, 
m{CR(B) = 0.25 and m{/C2(AUB) & 0.3333, which shows 
that m{/CR(.) 4 m,(.). The VBA m,(.) is a neutral element 
for the ZCR combination of m(.) with the VBA m,(.), 
because the discounted conjunctive mass for any focal element 


X is mi(X) = RES (X)my(0) = psgz: (X)-1 = 


+m,(X), where n = ||. The normalization constant equals 
R 


KZOR = oy +m(X) = 1/n. Therefore, after dividing by 
K7ZCOX, we always gets mZ°%(X) = mi(X) for any focal 


element X of m ,(.). Same property holds if we combine 
three (or more) BBA’s with the VBA and even if these 
BAA’s are in conflict or not. Because D4(X1,..., Xn, 9) = 
D4(X1,...,Xn)/|O| and m,(©) = 1, the constant |O| 
always simplifies in normalization step of ZCR and because 
conjunctive rule and Zhang’s degree are associative. In the 
general case, ZPCR6, SZPRC6, JPRC6 and SJPCR6 do not 
preserve the neutrality of the VBA. This can be verified using 
the simple example of the footnote no 9. More precisely, the 
combination [m; @ mz 6... 8 My, G m,|(.) is not equal to 
[m1 @m2@...@mM,](.). In the very specific case when there 
is no conflict between the BBA’s, only ZPCR6 rule preserves 
the neutrally of VBA because it coincides with ZCR. 


V. EXAMPLES 


Here we analyze the behavior of the different rules (DS, 
PCR6, ZCR, JCR, ZPCR6, JPCR6, SZPCR6 and SJPCR6) in 
emblematic examples to determinate which one presents the 
best interest for the combination of BBA’s. 


Example 1: (No conflicting case) 

Let’s consider the FoD 0 {Aj, Ao,...,Aio} with 
Shafer’s model, and the following two BBA’s to combine 
m,(A1) = 0.9, m,(Q) = 0.1, m(X) = 0.9 and m2(O) = 
0.1 where the focal element X of mo(.) can take the values 
Aj, A; U Ag, A, U Ao U As, soe OF O. 

In this case, the DS and PCRS5/6 rules coincide with 
the conjunctive rule of combination because there is no 


°For example, if one considers © = {A, B} with Shafer’s model, and 
the BBA’s {mi(A) = ai1,mi(B) = b1,mi(©) ci}, {me2(A) 
a2,m2(B) = b2,m2(O) = cz, my(O) = 1}. Then [mi G m2](.) F 
[m1 @ mz © my|(.) (where @ denotes the PCRS or PCR6 fusion rule) 
because in m1 @ mz nothing from the redistribution of the conflicting mass 
goes to ignorance, contrarily to what happens in [m1 ® m2 ® my|(.). 
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conflicting mass to redistribute because m12(0) = 0. If 
X = Aj, then mP3(A1) = mP§*®(A1) = m4(A1)me (A1)+ 
m41(A1)m2(9) + m1(@)me2(A1) = 0.99 and mP3(O) = 
mt$"°(0) = m1(O)m2(@) = 0.01, which is a reasonable 
result since the belief in A, is reinforced because each source 
does strongly support the same hypothesis A;. When X D A; 
and |X| > 1, the behavior of the conjunctive rule becomes 
questionable because one always gets 


mP§ (Ar) = mp s®°/8(A1) = mi(A1)(ma(X) + ma(@)) = 0.9, 
mP§ (X) = mi $*°/8(X) = mi()ma(X) = 0.09, 
mP$ (©) = mp9 "°/°(@) = mi(O)m2() = 0.01. 


When X — ©, mg(.) tends to become a fully ignorant source 
of evidence, and the combination of m (.) with ma(.) tends 
towards m(.) because ™m(.) brings none useful information 
at all in this limit case. This behavior of conjunctive rule 
is then conform with what we intuitively expect. However, 
when |X| decreases from r = 10 to r = 2, the behavior of 
conjunctive rule (and in this case DS and PCR6 rules also) is 
not very satisfactory, because we obtain same results on the 
mass of A; whatever the cardinality of X is. In fact, it is rather 
intuitively expected that after the combination, the mass of A; 
should substantially increase if the cardinality of X decreases 
because ™m2(.) becomes more and more specific (and focused 
towards A,;). When mo(.) is more in agreement with m1(.), 
the combination of mi(.) with m2(.) should reinforce the 
belief on A; when |X| decreases, which is not what happens 
with the pure (strict) conjunctive rule. 

Let’s examine how ZCR, JCR rules work in this example. 
Let |X| = r > 1, and r < 10. Also |O| = JA, U Ag U 
...UAjo| = 10. If we compute the (unnormalized) discounted 
conjunctive fusion with Zhang’s degree of intersection, we get 


— [ALA xX | |Ar NO} 
mfo(A1) = Aa px Avra) a5 TA oy Avm2(9) 
= =(0.9)(0.9) + 5 (0.901041) = = + 0.009, 
Zz = enx| _ il = 
mE 2X) = Te qi (O)malX) = 75 (0-1)(0.9) = 0.009, 
zn. |ONe| a 7 
mfa(@) = SE mi (@)m2(@) = 55 (0.1)(0.1) = 0.001. 


If we compute the (unnormalized) discounted conjunctive 
fusion with Jaccard’s degree of intersection, we get 


Ainx| |Ar Ne} 


J = 
my2(A1) = Apu x] Avia (X) oa [Au | (A1)m2(9) 
= ~(0.9)(0.9) + 5 (0.900) = = + 0.009, 
_ |Onx| wat _ - 
mi (X) = 75 7M (O)ma(X) = 7p (0-1)(0.9) = 0.009 -r, 
_ jene _ 10 7 
mj 2(O) = Au gee) = 7 OOD = 0.01. 


After normalization of m#(.) by K7, = 2* + 0.019, and 
mi o(.) by Kj, = 25* + 0.009-r + 0.010 we get the result 


of ZCR and JCR rules, which are 
m¥GR(Ar) = [> + 0.009)/K Fn, mi GR(A1) = [> + 0.009)/K 7s, 


mi$*(X) =0.009/Kfs, 
mi$*(@) = 0.001/Kf2, 


miS®(X) = 0.009 -r/K7’s, 
miS®(@) = 0.01/Ki 9. 
In the limit case when r = 1 we get 
m£§®(A1) = 0.988, 
mi$*(@) = 0.012, 


mi S®(A1) = 0.988, 
mi $®(@) = 0.012. 
In the limit case when r = 10 we get 
mZ$®(A1) = 0.90, 
mf$F(Q) = 0.10, 


mi$®(A1) = 0.4337, 
mi S®(@) = 0.5263. 


Clearly, one sees that both ZCR and JCR have now a good 
expected behavior when |X| decreases, but only ZCR provides 
also a good behavior when r = 10 because in this case one 
gets mf" (.) = m(.) which is normal because m2(.) is the 
VBA (fully ignorant source). With JCR, the result we obtain 
when |X| = r = 10 is not good because m/$*(.) # mi(.). 
Because there is no conflict, ZPCR6 rule coincides with ZCR 
rule in this example, and JPCR6 rule coincides with JCR rule. 
Therefore, JPCR6 rule does not work well (at least for this 
example) as explained previously. The evaluation of masses 
of A; and of © after the combination of m(.) with mo(.) for 
the different rules is shown in Fig. | and Fig. 2 respectively 
and for different values of r = |X]. 


m(A,) after the combination 


mass of belief 
© 
g 


—+— DS(=PCRS) 
—*— ZCR 
0s —+— JOR 
© SZPRC6 
— * — SUPCR6 


1 | 3 4 5 6 Cd 8 9 10 
1 =Card(X) 


Figure 1. m(Aj;) after combination of m1(.) with ma(.). 


m(®) after the combination 


—+*— DS(=PCRS5) 1 
—+—ZcR i] 
—*— JCR | 

oak ©». SZPRC6 {4 
— *« —SJPCR6 


mass of belief 
s 
& 


Figure 2. m(O) after combination of m1(.) with me(.). 
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If we apply sophisticate normalization procedures we ob- 
tain! with SZPCR6 and SJPCR6 


=i 
ane gO (A,) = 0.0819 + 0.81 -- - , 

= 
m$ZPCR6 (x) — 0.0819 + 0.405 - 
mSZPCR6 (Q) = 0.0262, 


P r—-1 
mp3? CR6(A,) = 


—1 
mp 3PCR6(X) = 0.009 - r + 0.405 - "~~ + (10 —r) - 0.0081, 
- 
mp 3PCR6(Q) = 0.0181 + (10 — r) - 0.0081. 
In the limit case, when r = | we get 
me ZPCRE(A1) = 0.9738, m3 3PCR6( 41) = 0.9738, 
me ZPCRE(Q) = 0.0262, me 3PCR6 (Q) = 0.0262. 
In the limit case, when r = 10 we get 
mPZPCR6(A1) = 0.5274, 
me ZPCRE(Q) = 0.4726, 


mP5PCR6(A,) = 0.5274, 
me 3PCR6(Q) = 0.4726. 


This result shows clearly that SZPCR6 and SJPCR6 rules 
behave better than conjunctive rule (and so better than DS 
and PCR6 rules) in the limit case when X = Aj, because 
after the combination the mass committed to A, is reinforced 
(as it is naturally expected). But the reinforcement of mass 
of A; is lower than with ZPCR6 or JPCR6 rules!! based on 
simple normalization because the sophisticate normalization 
procedure degrades the specificity of the information. In 
the other limit case when r = 10, (ie. X = ©, and mg(.) 
equals the VBA) SZPCR6 and SJPCR6 rules do not work 
well because clearly one has ing) #~ mj,(.) and 

mPZPCR6(.) 4 my(.) also. So we at least have shown one 
example where SZPCR6 and SJPCR6 are not very efficient 
and consequently, we do not recommend to use them. In 
summary, only ZCR and ZPCR6 (equivalent to ZCR in this 
example) allow to get an acceptable behavior for combining 
the two BBA’s mj4(.) and mo(.) for any focal element 
XD Aj. 


Example 2 (Zadeh [10], [12]): (Conflicting case) 

Let’s 0 = {A,B,C} with Shafer’s model, and the two 
BBA’s to combine m (A) = 0.9, m1(C) = 0.1, mo(B) = 0.9 
and m2(C) = 0.1. 

In this case, Shafer’s conflict is ™m1,2(@) = m1(A)(m2(B)+ 
m(C)) + m1(C)m2(B) = 0.9 + 0.1- 0.9 = 0.99. If we use 
DS rule (2), we get m3 (C) = 1. The discounted conjunctive 
consensus D(C, C)m1(C)m2(C) (with Zhang’s or Jaccard’s 
degree) is always equal to the un-discounted conjunctive con- 


sensus m1(C)m2(C) = 0.01 because D4(C,C) = eat = = 
1 and D7(C,C) = a = 1. Therefore the degree of 


intersection does not impact the conjunctive combination result 


'OHere there is no conflicting mass to redistribute which makes the 
derivation more easier. 

'lwhich coincide here with ZCR and JCR rule because there is no 
conflicting mass to redistribute. 
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and ZCR and JCR rules (4) give same counter-intuitive result 
as DS rule, that is mZSR(C) = =e (6) = mp3 (C) = 1. 
Because the degree of intersection does not impact the 
conjunctive combination part of PCR6 rule in this example, 
modified PCR6 rules (ZPCR6, JPCR6, SZPCR6 and SJPCR6) 


give the same result as PCR6 rule which is i (A) = 
0.486, m?'S"°/6(B) = 0.486 and mi "°/9(C) = 0.028. 


In summary, ZCR and JCR rules do not help to modify the 
result obtained by DS rule in Zadeh’s example and cannot 
be viewed as real alternatives to DS rule for this example. 
Conversely, ZPCR6, JPCR6, SZPCR6 and SJPCRO6 rule 
(which coincide with PCR6 rule in this example) remain 
good alternatives to DS rule. 


Example 3 (Voorbraak [15]): (Conflicting case) 

Let’s consider the FoD 0 = { A, B, C} with Shafer’s model, 
and the following two BBA’s to combine m,(A) 0.5, 
m,(BUC) = 0.5, m2(C) = 0.5, and m2(A U B) = 0.5. 

One has ™m4.2(0) = mi(A)m2(C) = 0.25, and DS rule 
gives m3 (A) = m?3(C) = 1/3. As reported 
by Voorbraak [15], this result is counterintuitive, since in- 
tuitively 6 seems to share twice a probability mass of 0.5, 
while both A and C’ only have to share once 0.5 with B 
and are once assigned 0.5 individually. This counterintuitive 
result comes from the fact that DS rule implicitly assumes 
that all possible pairs of focal elements are equally confirmed 
by the combined evidence, while intuitively, in this example 


B = (BUC)nN(AUB) is less confirmed than A = AN(AUB) 
and C = (BUC)NC. With ZCR and JCR rules, we get 
mt$* (A) = 0.40, mi$®(A) = 0.375, 
mis *(B) =0.20, mis" (B) = 0.250, 
mis (C) = 040, mis” (C) =0.375. 


Contrarily to DS rule, with ZCR or JCR rules one sees 
that the mass committed to B is less than of A and of 
C which is a more reasonable result. In applying PCR6 
rule, we also circumvent this problem because we get from 
Bg. (13), mis (A) = 0.375, mis (Bb) = 025 and 
mt$"°(C) = 0.375 (same as with JCR results for this 
particular example). 

With ZPCR6 rule, we compute at first the following (un- 
normalized) discounted conjunctive masses added with pro- 
portional conflict redistribution 


AN(AUB)| 
Z4(A PN A AUB = 0.2 
my9(A) = [A] [AU BI m1(A)m2(AU B) + > 571.2(0) = 0.25, 
BUC)N(AUB) 

4,(B _ Eee BUC AUB) = 0.0625 
my 2(B) [BUC] |AUB| my ( )mea ( ) ’ 
- \(BUC)NC| 1 

C) = ~—— mi (BUC C)+ = = 0.25. 
mj,2(C) IBUC]-|C| ma ( )mea( )+ 57m1,2(0) 


After a simple normalization (dividing by fo = 0.25 + 


0.0625 + 0.25 = 0.5625), we get finally 
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mio *°(A) = 0.25/0.5625 ~ 0.4444, 
mZPCR6(B) = 0.0625/0.5625 ~ 0.1112, 
mZPCR(C) = 0.25/0.5625 ~ 0.4444. 

Similarly, if we apply JPCR6 rule based on Jaccard’s index 
and simple normalization step, we will get the following result 
mip R(A) = [(0.25/2) + 0.125]/K 7. + 0.4286, 

mis (BR) = (0.25/3)/K7 5 = 0.1428, 
mibCR6(C) = [(0.25/2) + 0.125]/Ky, + 0.4286, 


where the normalization factor equals Kj, = (0.25/2) + 
0.125 + (0.25/3) + (0.25/2) + 0.125 ~ 0.5833. 


These results show that ZPCR6 and JPCR6 rules diminish 
substantially the mass committed to B (as expected) and 
reinforce more strongly the masses of A and C' than with 
ZCR, JCR or PCR6 rules. 


If we apply the sophisticate normalization for SZPCR6, the 
lost discounted mass (1 — ea ym (A)ma(A UB) = 
0.125 is redistributed to A and to AU B proportionally!” to 
m,(A) = 0.5 and m2(AUB) = 0.5. Similarly, the second lost 
discounted mass (1— esa ma (BUC)m(AUB) = 
0.1875 is redistributed to BUC and to AU B proportionally 


to m(BUC) = 0.5 and m2(A U B) = 0.5, and the third 


lost discounted mass (1 - TBuertoy mB U C)m2(C) = 
0.125 is redistributed to B UC and to C proportionally to 


m(BUC) = 0.5 and m2(C) = 0.5. Similar computations 
are done for SJPCR6 in replacing Zhang’s degree by Jaccard’s 
degree of intersection. Finally we obtain with SZPCR6 and 
SJPRC6 the following combined masses: 


mp ZPCR6(A) = 0.3125, 
tea” (=e, 
mea” “8 (C) =0.3125, 


mPZPCR6(AU B) = 0.15625, 
mig (BUC) = 0.15625, 


and 
mP3PCR6(4) = 0.3125, me 3PCR6 (4 U B) = 0.14585, 
mPZPCR6(B) = 0.0833, me 3PCR6 (BUC) = 0.14585, 
me ZPCR6(C) = 0.3125. 


Of course, these results are a bit less specific than with ZPCR6 
and JPCR6, which is normal. As shown, SZPCR6 and SJPCR6 
rules diminish also the mass committed to B (as expected) 
but reinforce less strongly the masses of A and C’ because the 
specificity of the result is degraded because one gets positive 
masses committed to new uncertainties A UB and BUC. 
For this example, ZPCR6 and JPCR6 are the most interesting 
rules for combining BBA’s m ,(.) and ma(.). 


equally in fact in this case. 
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Example 4 (Dezert et al. [3]): (Conflicting case) 

This emblematic example is very interesting to analyze 
because for in this case the DS rule does not respond to level of 
conflict between the sources. This anomaly has been analyzed 
and discussed in details in [3]. 

Let’s consider the FoD 0 = {A, B, C} with Shafer’s model, 
and the following two BBA’s to combine 


My (A) = 0.9, 
m2(C) = 0.7, 


m2(AU BUC) = 0.2. 


In this example, the two sources are not vacuous (they are 
truly informative), they are in conflict because m, 2()) = 0.7 
but DS rule does not respond to the level of conflict because 
one gets ™1,2(.) = ™m1(.). In fact, the second source has no 
impact in the DS fusion as if it is equivalent to the VBA. 

If we apply PCR6 rule of combination the first par- 
tial conflict m (A)m2(C) = 0.72 is redistributed to A and 
C proportionally to m(A) and m2(C), and the sec- 
ond conflict m (AU B)m2(C) = 0.08 is redistributed to 
AUB and to C proportionally to m;(A U B) and m2(C). 
So with PCR6 rule (7), we obtain m{$"°(A) = 0.6244, 
mPSF6(A U B) = 0.0388 and mts *6(C) = 0.3369. One 
sees that the PCR6 fusion result now reacts with the value 
of second sources because m?96(.) 4 m4(.) which makes 
sense if both sources are equireliable, truly informative and in 
some disagreement. 

In discounting with Zhang’s degree, one gets the (unnor- 
malized) discounted conjunctive BBA 


1 
mi (A) = 3 


2 
) = 5-5 (0-1)(0.1) + 


(0.9)(0.1) + 5 (0.9)(0.2) = 0.1050, 
2 
2-3 
After the normalization by the factor kK fo 
0.1050 + 0.0117 0.1167, we get finally 
m£§*(A) = 0.1050/0.1167 = 0.9 and mZG%(AU B) = 
0.0117/0.1167 ~ 0.1. Therefore as with DS rule, we get 
same behavior with ZCR rule that is m7$"(.) = mj(.) as if 
the second informative source does not count in the fusion 

process, which is abnormal. 


(0.1)(0.2) + 0.0117. 


If we use Jaccard’s degree, one gets 


(0.9) (0.1) + 


)(0.1) + 


After the normalization by the factor 1’ i g = 0.1050 + 
0.0233 = 0.12833, we get finally m/$"(A) = 0.8182 and 
m{§"(AUB) © 0.1818. One sees that JCR fusion result is not 
equal to the BBA m,(.), which means that m2(.) has had some 
impact in the fusion process with JCR (as expected). However, 
it is not clear why such JCR result will really make sense or 
not. Because we have already shown in Example 1, that it 
can happen than JCR does not work well, we have serious 


(0.9) (0.2) = 0.1050, 


wlmwlrF 


(0.1) (0.2) + 0.0233. 
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doubt on the interest of using JCR result in such emblematic 


example. 

With ZPCR6 rule of combination, we obtain 
mi CRSA) = KeFors [0.1050 + 2(A)] = 0.56250, 
miPCR(AU B)= Repos [0.0117 + 2(A UB) = 0.0250, 
and mfZCR6(C) = xerors [v1 (C) + x9(C)] = 0.4125, 


where KfP CR6 is the normalization constant, and where 
x(A) = m,(A) Come) — 0.354375 is the part of the 
conflicting mass m4(A)m2(C) 0.63 transferred to A; 
«1(C) my (C) ena = 0.275625 is the part of 
the conflicting mass m,(A)m2(C) = 0.63 transferred to C; 
a(AUB) = m (AUB) a = 0.00875 is the part 
of the conflicting mass m,(A U B)m2(C) = 0.07 transferred 
to AUB; and 22(C) = m(C) SSP = 0.06125 
is the part of the conflicting mass m;(A U B)m2(C) = 0.07 
transferred to C. 

With JPCR6 rule’ of combination, obtain 
mi 5C6 (A) & 0.55458, mi SCRA U B) 0.03873 
and gy oO (GC) 0.40669, which is close to ZPCR6 
result. Comparatively to PCR6, we diminish the mass of 
belief committed to A and to AU B and we reinforce the 
mass committed to C using ZPCR6 and JPCR6 rules. We 
do not give results with SZPCR6 and SJPRC6 due to space 
constraint and because we know that these rules do not 
perform so well as shown in the previous examples. 


we 


Example 5 (Sebbak [16]): (Conflicting case with 3 sources) 
Let’s consider the FoD 8 = {A, B, C} with Shafer’s model, 
and the following three BBA’s to combine 


m2(A) = 0.1, m2(C) = 0.9, 


The conjunctive rule gives 


m1,2,3(A) = m1(A)me2(A)m3(A) + m1(A)m2(A)ms3(0) 
+ m1(O)m2(A)m3(0) + m1 (©)m2(A)m3(A) = 0.10 
m1,2,3(C) — m1 (O)me2(C)m3(0) = 0.108 
with the total conflicting mass 
™m1,2,3(0) = m1(A)m2(C)m3(A) + m1(A)m2(C)ms3 (0) 


With DS rule we get m°(A) = 0.4808 and m2°(C) 
0.5192, and With PCRS and PCR6 rules [17] 


~ 
~ 


mPSR5(A) = 0.3450, mi593°(A) = 0.4340, 
me SPC} = 05807, ami GPC) = 0.4487, 
mPS% (9) = 0.1223,  mPS(e@) = 0.1228. 


Note that with PCRS one gets 0.4247/0.7920 ~ 53.62% of 
the total conflicting mass redistributed to C, but not almost all 
conflicting mass. Using PCR6, C' actually gained from the total 
conflicting mass only 0.3357/0.7920 ~ 42.3864%, not even 
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half of it, not almost all of the conflicting mass (the majority) 
as the authors wrongly claimed in [16]. 
With ZCR and JCR rules, one gets 
miss (A) = 0.8125, 
mis e(@) = 0.1875, 


mi $8 (A) = 0.6032, 
mi $8 () = 0.3968. 
With ZPCR6, JPCR6, SZPCR6 and SJPCR6 rules! one gets 


meFSR5(4) = 0.4511, mi5$F°(A) = 0.4405, 

mEPSRE(C) — 0.4061, Ming (C) = 0.4210, 

m229R6(Q) = 0.1428, miss (8) = 0.1385. 
mes 30"6(A) = 0.4102955, mP3/30%6(A) = 0.412699, 
me Z5CR8(C) = 0.3984240, mi25°"8(C) = 0.409718, 
mSZECH6(Q) = 0.1912805, mP25C"8(e) = 0.177616. 


One sees that C’ gained (0.4061 — 0.108) /0.7920 © 37.64% 
using ZPCR6, (0.4210 — 0.108)/0.7920 ~ 39.52% using 
JPCR6, (0.398424 —0.108)/0.7920 ~ 36.67% using SZPCR6, 
and (0.409718 — 0.108) /0.7920 = 38.10% using SJPCR6. 


VI. CONCLUSIONS 


The modifications of the PCR6 rule of combination pre- 
sented exploit judiciously Zhang’s and Jaccard’s degrees of in- 
tersections of focal elements. Our analysis shows that ZPCR6 
rule is in fact the most interesting modified PCR6 rule because 
it behaves well in all emblematic examples contrarily to other 
rules. SZPCR6 and SJPCR6 rules are more complicate to 
implement and they increase the non-specificity of the result in 
general which is not good for helping the decision-making. So 
we do not recommend them for applications. All these rules 
are not associative and do not preserve the neutrality of VBA 
when some sources are in conflict. 
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Abstract—In many applications involving epistemic uncertain- 
ties usually modeled by belief functions, it is often necessary 
to approximate general (non-Bayesian) basic belief assignments 
(BBAs) to subjective probabilities (called Bayesian BBAs). This 
necessity occurs if one needs to embed the fusion result in a sys- 
tem based on the probabilistic framework and Bayesian inference 
(e.g. tracking systems), or if one wants to use classical decision 
theory to make a decision. There exists already several methods 
(probabilistic transforms) to approximate any general BBA to a 
Bayesian BBA. From a fusion standpoint, two approaches are 
usually adopted: 1) one can approximate at first each BBA in 
subjective probabilities and use Bayes fusion rule to get the final 
Bayesian BBA, or 2) one can fuse all the BBAs with a fusion rule, 
typically Dempster-Shafer’s, or PCR6 rules (which is very costly 
in computations), and convert the combined BBA in a subjective 
probability measure. The former method is the simplest method 
but it generates a high loss of information included in original 
BBAs, whereas the latter is intractable for high dimension 
problems. This paper presents a new method to achieve this 
task based on hierarchical decomposition (coarsening) of the 
frame of discernment, which can be seen as an intermediary 
approach between the two aforementioned methods. After the 
presentation of this new method, we show through simulations 
how its performs with respect to other methods. 

Keywords: Information fusion, belief functions, DST, DSmT, 


PCR6 rule, coarsening. 


I. INTRODUCTION 


The theory of belief functions, known as Dempster-Shafer 
Theory (DST) has been developed by Shafer [1] in 1976 
from Dempster’s works [2]. Belief functions allow to model 
epistemic uncertainty and they have been already used in many 
applications since the 1990’s [3], mainly those related to expert 
systems, decision-making support and information fusion. To 
palliate some limitations of DST, Dezert and Smarandache 
have proposed an extended mathematical framework of belief 
functions with new efficient quantitative and qualitative rules 
of combinations, which is called DSmT (Dezert and Smaran- 
dache Theory) in the literature [4], [5] with applications listed 
in [6]. One of the major drawbacks of DST and DSmT is their 
high computational complexities, as soon as the fusion space 
(i.e. frame of discernment - FoD) and the number of sources 
to combine are large!. 


'DSmT is more complex than DST, and the Proportional Conflict Redistri- 
bution rule #6 (PCR6 rule) becomes computationally intractable in the worst 
case as soon as the frame of discernment has at least six elements. 
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To reduce the computational cost of operations with belief 
functions when the number of focal elements is very large, 
several approaches have been proposed by different authors. 
Basically, the existing approaches rely either on efficient 
implementations of computations as proposed for instance in 
[7], [8], or on approximation techniques of original Basic 
Belief Assignment (BBA) to combine [9]-[12], or both. In 
many applications involving epistemic uncertainties usually 
modeled by belief functions, it is often necessary to approxi- 
mate general (non-Bayesian) basic belief assignments (BBAs) 
to subjective probabilities (called Bayesian BBAs). This neces- 
sity occurs if one needs to embed the fusion result in a system 
based on the probabilistic framework and Bayesian inference 
(e.g. tracking systems), or if one wants to use classical decision 
theory to make a decision. From a fusion standpoint, two 
approaches are usually adopted: 1) one can approximate at 
first each BBA in subjective probabilities and use Bayes fusion 
tule to get the final Bayesian BBA, or 2) one can fuse all 
the BBAs with a fusion rule, typically Dempster-Shafer’s, or 
PCR6 rules (which is very costly in computations), and convert 
the combined BBA in a subjective probability measure. The 
former method is the simplest method but it generates a high 
loss of information included in original BBAs, whereas the 
latter direct method is intractable for high dimension problems. 
This paper presents a new method to achieve this task based 
on hierarchical decomposition (coarsening) of the frame of 
discernment, which can be seen as an intermediary approach 
between the two aforementioned methods. 

This paper presents a new approach to fuse BBAs into a 
Bayesian BBA in order to reduce computational burden and 
keep the fusion tractable even for large dimension problems. 
This method is based on a hierarchical decomposition (coars- 
ening) framework which allows to keep as much as possible 
information of original BBAs in preserving lower complexity. 
The main contributions of this paper are: 


1) the presentation of the FoD bintree decomposition on 
which will be done the BBAs approximations; 

2) the presentation of the fusion of approximate BBAs from 
bintree representation. 


This hierarchical structure allows to encompass bintree 
decomposition and BBAs approximations on it to obtain the 
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final approximate fusioned Bayesian BBA. 

This paper is organized as follows. In section II, we recall 
some basics of DST and DSmT that are relevant to the new 
method presented in this paper. More details with examples 
can easily be found in [1], [5]. We will also briefly recall 
our preliminary works about hierarchical coarsening of FoD. 
Section III presents the novel hierarchical flexible (adaptive) 
coarsening method which can be regarded as the extension of 
our previous works. Two simple examples are given in section 
IV to illustrate the detailed calculation steps. Simulation 
experiments are presented in section V to show the rationality 
of this new approach. Finally, Sect.VI concludes the paper 
with future works perspectives. 


Il. MATHEMATICAL BACKGROUND 


This section provides a brief reminder of basics of DST and 
DSnT, and of original hierarchical coarsening method which 
are necessary for the presentation and the understanding of 
the more general flexible coarsening approximate method of 
section III. 


A. Basics of DST and DSmT 


In DST framework, the frame of discernment” © 
{61,...,On} (n > 2) is a set of exhaustive and exclusive 
elements (hypotheses) which represent the possible solutions 
of the problem under consideration and thus Shafer’s model 
assumes 6; 6; = @ fori # j in {1,...,n}. A basic 
belief assignment (BBA) m(-) is defined by the mapping: 
2° + [0,1], verifying m(0) = 0 and > 420 m(A) = 1. In 
DSmT, one can abandon Shafer’s model (if Shafer’s model 
doesn’t fit with the problem) and refute the principle of 
the third excluded middle*. Instead of defining the BBAs 
on the power set 2° = (©,U) of the FoD, the BBAs 
are defined on the so-called hyper-power set (or Dedekind’s 
lattice) denoted D° = (©,U,M) whose cardinalities follows 
Dedekind’s numbers sequence, see [5], Vol.1 for details and 
examples. A (generalized) BBA, called a mass function, m/(-) 
is defined by the mapping: D© +> [0,1], verifying m(0) = 0 
and )7 4epe m(A) = 1. DSmT framework encompasses DST 
framework because 2° c D®. In DSmT we can take into ac- 
count also a set of integrity constraints on the FoD (if known), 
by specifying all the pairs of elements which are really 
disjoint. Stated otherwise, Shafer’s model is a specific DSm 
model where all elements are known to be disjoint. A € D® is 
called a focal element of m/(.) if m(A) > 0. A BBA is called 
a Bayesian BBA if all of its focal elements are singletons 
and Shafer’s model is assumed, otherwise it is called non- 
Bayesian [1]. A full ignorance source is represented by the 
vacuous BBA m,(@) = 1. The belief (or credibility) and 
plausibility functions are respectively defined by Bel(X) + 
Lyepe|ycx MY) and PI(X) - Vreveyynxzo MY). 
BI(X) © [Bel(X), Pl(X)] is called the belief interval of 


A 


?We use the symbol * to mean equals by definition. 
3The third excluded middle principle assumes the existence of the comple- 
ment for any elements/propositions belonging to the power set 2°. 
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X. Its length U(X) = Pl(X) — Bel(X) measures the degree 
of uncertainty of X. 

In 1976, Shafer did propose Dempster’s rule* to combine 
BBAs in DST framework. DS rule is defined by mps(@) = 0 
and VA € 2° \ {0}, 


DB,Ce2°|BAC=A m1(B)m2(C) 
1 > B,Ce2°|BNC=0 m4(B)m2(C) 


DS rule formula is commutative and associative and can be 
easily extended to the fusion of S > 2 BBAs. Unfortunately, 
DS rule has been highly disputed during the last decades 
by many authors because of its counter-intuitive behavior in 
high or even low conflict situations, and that is why many 
rules of combination have been proposed in the literature to 
combine BBAs [13]. To palliate DS rule drawbacks, the very 
interesting PCR6 (Proportional Conflict redistribution rule #6) 
has been proposed in DSmT and it is usually adopted? in 
recent applications of DSmT. The fusion of two BBAs m,(.) 
and m2(.) by the PCR6 rule is obtained by mpcr6(0) = 0 
and VA € D® \ {0} 


mps(A) = () 


mpcre6(A) = mi2(A)+ 
> m1(A)?m2(B) 


BeD®\{A}|ANB=0 m(A) + m2(B) 


m2(A)?my, (B) 
m2(A) + m1(B) 


I, 
(2) 


where m12(A) = )’p cepejpncaa™1(B)me(C) is the 
conjunctive operator, and each element A and B are expressed 
in their disjunctive normal form. If the denominator involved 
in the fraction is zero, then this fraction is discarded. The 
general PCR6 formula for combining more than two BBAs 
altogether is given in [5], Vol. 3. We adopt the generic notation 
mECR6(.) = PCR6(m1(.),ma(.)) to denote the fusion of 
my(.) and mo(.) by PCR6 rule. PCR6 is not associative 
and PCR6 rule can also be applied in DST framework (with 
Shafer’s model of FoD) by replacing D®° by 2° in Eq. (2). 


B. Hierarchical coarsening for fusion of Bayesian BBAs 


Here, we briefly recall the principle of hierarchical coarsen- 
ing of FoD to reduce the computational complexity of PCR6 
combination of original Bayesian BBAs. The fusion of original 
non-Bayesian BBAs will be presented in the next section. 

This principle was called rigid grouping in our previous 
works [17]-[19]. The goal of this coarsening is to replace 
the original (refined) Frame of Discernment (FoD) © by a 
set of coarsened ones to make the computation of PCR6 rule 
tractable. Because we consider here only Bayesian BBA to 
combine, their focal elements are only singletons of the FoD 
© £ {61,...,0n}, with n > 2, and we assume Shafer’s model 
of the FoD O. 

A coarsening of the FoD © means to replace it with another 
FoD less specific of smaller dimension Q = {w ,...,w} with 
k < n from the elements of ©. This can be done in many 


4We use DS index to refer to Dempster-Shafer’s rule (DS rule) because 
Shafer did really promote Dempster’s rule in in his milestone book [1]. 
5PCR6 rule coincides with PCR5 when combining only two BBAs [5]. 
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ways depending the problem under consideration. Generally, 
the elements of 9 are singletons of ©, and disjunctions of 
elements of ©. For example, if 0 = {61, 02,03, 04}, then the 
possible coarsened frames built from © could be, for instance, 
Q= {wy = O14, We = A2, W3 => 03 U 64}, or Q = {wi 
0, UO2, w2 = 03U04}, etc. When dealing with Bayesian BBAs, 
the projection® m“(.) of the original BBA m®(.) is simply 
obtained by taking 
>) m°?(6;). 


0; fw 


m° (wi) 


(3) 


The hierarchical coarsening process (or rigid grouping) is 
a simple dichotomous approach of coarsening obtained as 
follows: 
e If m =|O| is an even number: 
The disjunction of the n/2 first elements 6; to 0x of © 
define the element w; of 9, and the last n/2 elements 
Ona to 6, of © define the element we of Q, that is 


OF {1 =O, U...UO2, we = 0041 U...U On}, 


and based on (3), one has 


m°1)= SJ) m°;), (4) 
j=1,...,9 

m°(w2)= >> m°(6;) (5) 
JH etl. n 


For example, if O = {01, 62, 43,04}, and one considers 
the Bayesian BBA m°(6,) = 0.1, m°(62) = 0.2, 
m®°(63) = 0.3 and m°(64) = 0.4, then Q = {w, = 
01 U 02, We = 03 U O4} and m®(w1) = 0.14 0.2 = 0.3 
and m° (we) = 0.3 + 0.4 = 0.7. 

If n = |O| is an odd number: 

In this case, the element w, of the coarsened frame (2 is 
the disjunction of the [n/2+1]’ first elements of ©, and 
the element we is the disjunction of other elements of O. 
That is 


Q4 {wi = 0,0... Udn 41), W2 = fe 4aj41 Devens UOn}, 
and based on (3), one has 


(6) 


(7) 


yaenty 


j=[g +41 


For example, if © = {61, 02,03, 04,45}, and one consid- 
ers the Bayesian BBA m®°(6,) = 0.1, m°(@2) = 0.2, 
m® (03) = 0.3, m°(04) = 0.3 and m°(65) = 0.1, then 
oe {wi 01 U 82 U 63, wa = 64 U 65} and m?(w1) = 
0.1+0.2+0.3 = 0.6 and m@(we) = 0.3 + 0.1 = 0.4. 


®For clarity and convenience, we put explicitly as upper index the FoD for 
which the belief mass refers. 
7The notation [a] means the integer part of x. 
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Of course, the same coarsening applies to all original BBAs 
m9(.), s =1,....9 of the S > 1 sources of evidence to work 
with less specific BBAs m®(.), s = 1,....S. The less specific 
BBAs (called coarsened BBAs by abuse of language) can then 
be combined with PCR6 rule of combination according to 
formula (2). This dichotomous coarsening method is repeated 
iteratively | times as schematically represented by a bintree®. 
The last step of this hierarchical process is to calculate the 
combined (Bayesian) BBA of all focal elements according 
to the connection weights of the bintree structure, where the 
number of iterations (or layers) | of the tree depends on 
the cardinality |O| of the original FoD ©. Specifically, the 
assignment of each focal element is updated according to the 
connection weights of link paths from root to terminal nodes. 
This principle is illustrated in details in the following example. 


Example 1: Let’s consider O = {61,02,03, 64,45}, and the 
following three Bayesian BBAs 


Focal elem. | m?(.) | mP(.) | m}(.) 
0, 0.1 0.4 0 
7D) 0.2 0 0.1 
3 0.3 0.1 0.5 
04 0.3 0.1 0.4 
Os 0.1 0.4 0 


The hierarchical coarsening and fusion of BBAs is obtained 
from the following steps: 

Step 1: We define the bintree structure based on iterative 
half split of FoD as shown in Fig. 1. 


0, 0, 0, 0, 8, 


ome | 
u 4 On 


lo 
J 


112 


Figure 1: Fusion of Bayesian BBAs using bintree coarsening 
for Example 1. 


The connecting weights are denoted as \j,.. 

elements of the frames (2; are defined as follows: 
e At layer f= Q, = {wy & 0, U 05 U 83, Wo & 64 U 65} 
e At layer | = 2: 


: , As: The 


Og = {w11 = 61 U O2, wi2 = 03, wo1 = 04, W22 = O5} 


8Here we consider bintree only for simplicity, which means that the 
coarsened frame 2 consists of two elements only. Of course a similar method 
can be used with tri-tree, quad-tree, etc. 
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e At layer l 3: Oz {wii 4 61, W112 a 62} 
Step 2: The BBAs of elements of the (sub-)frames Q; are 
obtained as follows: 
e At layer / = 1, we use (6)-(7) because |O| = 5 is an odd 
number. Therefore, we get 


Focal elem. | m2 (.) | ms (.) | ms" (.) 
Wy 4 0, U A U 03 0.6 0.5 0.6 
wo = 64 U 65 0.4 0.5 0.4 


e At layer |] = 2: We work with the two subframes 01 4 
{wii, wie} and Qo 4 {w1, we2} of Q2 with the BBAs: 


Focal elem. | m2 (.) | m$?*(.) | m$?1(.) 
wi. = 0, Ub, $ a I 
wir = 03 $ é | p 
Focal elem. | m2? (.) | ms??(.) | mg??(.) 
woi = 04 3 z | 1 
waa = 65 : ; 0 


These mass values are obtained by the proportional 
redistribution of the mass of each focal element with 
respect to the mass of its parent focal element in the bin 
tree. For example, the value m$2* (w11) = 4/5 is derived 
by taking 


m9 (61) + m8 (02) 04 = =4 


m5? (wii) = 


Other mass values are computed similarly using this 
proportional redistribution method. 
e At layer | = 3: We use again the proportional redistribu- 
tion method which gives us 
Focal elem. | m3 (.) | m3 (.) | ms (.) 
1 0 
W112 4 A 0 | 1 
Step 3: The connection weights ; are computed 
from the assignments of coarsening elements. In each 
layer 1, we fuse sequentially? the three BBAs_ us- 
ing PCR6 formula (2). More precisely, we compute at 


Aa 
wii = 4 


WI NOOo|] 


first mCP) = PCR6(m%(.),m$"(.)) and then 
Mina G) = PCR6(miy *%(.),mg"(.)). Hence, we 


obtain the following connecting weights in the bintree: 
e At layer / = 1: 


At = M53. (w1) = 0.6297, 


dg = Mig | (w2) = 0.3703. 


e At layer | = 2: 


PCR6,Q21 


A3 = Mang | (W11) = 0.4137, 
es mi CREM (49) = 0.5863, 
Nee mE CREM (wo) = 0.8121, 
de = mf (un) = 0387 


°Because PCR6 fusion is not associative, we should apply the general 
PCR6 formula to get best results. Here we use sequential fusion to reduce the 
computational complexity even if the fusion result is approximate. 


m8 (01) + m8 (02) +m9(d3) 0.5 5. 


e At layer | = 3: 


= mips (wi11) = 0.3103, 


Ag = M53 (wiz) = 0.6897. 


Step 4: The final assignment of belief mass to the elements 
of original FoD © are calculated using the product of the 
connection weights of link paths from root (top) node to 
terminal nodes (leaves). We finally get the following resulting 
combined and normalized Bayesian BBA 


m® (01) = Ai: Ag - A7 = 0.6297 - 0.4137 - 0.3103 = 0.0808, 
m® (02) = 1+ Ag+ Ag = 0.6297 - 0.4137 - 0.6897 = 0.1797, 
m® (63) = Ay « Ag = 0.6297 - 0.5863 = 0.3692, 
m® (04) = Az: As = 0.3703 - 0.8121 = 0.3007, 
m® (65) = Ag - Ag = 0.3703 - 0.1879 = 0.0696. 


III. NEW HIERARCHICAL FLEXIBLE COARSENING METHOD 


Contrary to the (rigid) hierarchical coarsening method pre- 
sented in section II, in our new flexible coarsening approach 
the elements 6;, 2 = 1,...,n in FoD © will not be half 
split to build coarsening focal elements w;, 7 = 1,...,k of 
the FoD {);. In the hierarchical flexible (adaptive) coarsening 
method, the elements 6; chosen to belong to the same group 
are determined using the consensus information drawn from 
the BBAs provided by the sources. Specifically, the degrees 
of disagreement between the provided sources on decisions 
(01, 02,--: ,@n) are first calculated using the belief-interval 
based distance dg; [16], [20] to obtain disagreement vector. 
Then, the k-means algorithm is applied for clustering elements 
0;,7 =1,...,n based on the corresponding value in consensus 
vector. It is worth noting that values of disagreement reflect the 
preferences of independent sources of evidence for the same 
focal element. If they are small, it means that all sources have 
a consistent opinion and these elements should be clustered in 
the same group. Conversely, if disagreement values are large, 
it means that the sources have strong disagreement on these 
focal elements, and these focal elements need to be clustered 
in another group. 


A. Calculating the disagreement vector 

Let us consider several BBAs m9(-), (s = 1,..., S) defined 
on same FoD © of cardinality |Q| = n. The specific BBAs 
mo,(.), ¢ = 1,...,n entirely focused on 0; are defined by 
mo, (0;) = 1, and for X #4 6; mg,(X) = 0. The disagreement 
of opinions of two sources about 6; is defined as the Lj- 
distance between the dg, distances of the BBAs m9(.), s = 
1,2 to mo,(.), which is expressed by 


Dy2(9;) = \der(m? (-),me,(-))) —dar(m3 (-),mo,(-))|. (8) 


The disagreement of opinions of S > 3 sources about 6;, is 
defined as 


1 Ss Ss 
D,_~s(6) = 5 De 3 ldar(m? SP me, (-)) 


_ dpr(m? (-),ma,(.))|, (9) 
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where dg, distance is defined by!® [20] 


2r—1 


die, (m1,m2) = ,|ne- > [d! (BL (6;), Blo(0;))}?. (10) 
i=1 
Here, ne = 1/ar-t is the normalization constant and 
d'({a,b],[c,d]) is the Wasserstein’s distance defined by 
d'((a, b], [c, d]) [abe — stdya 4 Lpbea _ dela And 
BI(0;) = [Bel(6;), Pl(0;)]. 
The disagreement vector D,_¢ is defined by 
Di_s * [Di_-s(01),.--, Di—s(9n)}.- (11) 


B. Clustering focal elements 


Once Dj_+¢ is derived, a clustering algorithm is used to 
coarsen focal elements according to their corresponding values 
in D,_¢. In this paper, we have used the k-means algorithm!! 
to cluster focal elements. For each source s = 1,...,5, the 
mass assignments of focal elements in two!” different clusters 
are added up according to formulas (12)-(13). 


mE (ui) = S> m°(6i), (12) 
O,€wy 

mE (we) = S> m°(6;) (13) 
0; Ewe 


C. Combination of the BBAs 


Based on the disagreement vector and k-means algorithm, a 
new adaptive bintree structure based on this flexible coarsening 
decomposition is obtained (see example in the next section) 
and the elements in FoD © are grouped more reasonably 
in each layer of the decomposition. Once the adaptive bin- 
tree structure is derived, other steps (multiplications of link 
weights) can be implemented which are identical to hierarchi- 
cal (rigid) coarsening method presented in section II to get the 
final combined Bayesian BBA. 


D. Summary of the method 


The fusion method of BBAs to get a combined Bayesian 
BBA based on hierarchical flexible decomposition of the FoD 
consists of the four steps below illustrated in Fig. 2. 

e Step 1 (pre-processing): At first, all input BBAs to 
combine are approximated to Bayesian BBAs with DSmP 
transform. 

Step 2 (disagreement vector): D1_s(-) is calculated us- 
ing dp, distances to estimate the degree of disagreement 
of BBAs oe Saas mg on potential decisions 04,..., On. 
Step 3 (adaptive bintree): The adaptative bintree de- 
composition of the FoD © is obtained using k-Means 
algorithm to get elements of subframes {);. 

Step 4 (assignments and connection weights): For 
each source m9(-) to combine, the mass assignment of 


10Ror simplicity, we assume Shafer’s model so that |2°| = 2”, otherwise 
the number of elements in the summation of (10) should be |D©| — 1 with 
another normalization constant ne. 

'l which is implemented in Matlab™ 

'because we use here the bisection decomposition. 


31 


each element of subframe (2; is computed by (12)-(13). 
The weight of links between two layers of the bintree 
decomposition are obtained with PCR6 rule!?. 

e Step 5 (fusion): The final result (combined Bayesian 
BBA) is computed by the product of weights of link paths 
from root to terminal nodes. 


Input BBAs 


Product of path 
link weights 
Final Combined 
Bayesian BBA 


Figure 2: Hierarchical flexible decomposition of FoD for 
fusion. 


IV. TWO SIMPLE EXAMPLES 
A. Example I (fusion of Bayesian BBAs) 


Let us revisit example | presented in section II-B. It can be 
verified in applying formula (9) that the disagreement vector 
D,_3 for this example is equal to 


Dj_3 = (0.4085, 0.2156, 0.3753, 0.2507, 0.4086] 
The derivation of D,_3(61) is given below for convenience. 
Dy-3(61) = |der(m?(-), me, (1)) — der (m2 (-), me, (1))| 
+ |dgr(m3(-),mo, (A1)) — der(m§(-), mo, (1))| 


+ |dar(m$ (-), me, (61)) — der(m§ (-), mo, (61))| 
= 0.4085. 


Based on the disagreement vector and k-means algorithm, a 
new adaptive bintree structure is obtained and shown in Fig. 3. 
Compared to Fig. 1, the elements in FoD © are grouped more 
reasonably. In vector D,_3, 6, and @5 lie in similar degree of 
disagreement so that they are put in the same group. Similarly 
for 02 and 64. However, element 03 seems weird, which is 
put alone at the beginning of flexible coarsening. Once this 
adaptive bintree decomposition is obtained, other steps can 
be implemented which are identical to hierarchical coarsening 
method of section II to get the final combined BBA. 

The flexible coarsening and fusion of BBAs is obtained from 
the following steps: 


'3 9eneral formula preferred, or applied sequentially to reduce complexity. 
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Figure 3: Example 1: Flexible bintree decomposition of FoD. 


Step 1: According to Fig.3, the elements of the frames {), 
are defined as follows: 


e At layer l 1:04 {wi = 43, wo 8 6; U 6g U 04 U 65} 
e At layer l 2: OQ» {wai S 0, U Os, woo 4 OU O4} 
e At layer | 3: Og {wei = 01, Wo12 = 05, W221 = 


02, W222 = O4} 
Step 2: The BBAs of elements of the (sub-)frames (); are 
obtained as follows: 


e At layer / = 1, we use (12)-(13) and we get 


Focal elem. | m2 (.) | m$ (.) | m§"(.) 
W9 4 A, U A U 04 U Os 0.7 0.9 0.5 


e At layer | = 2: We use again the proportional redistribu- 
tion method which gives us: 


Focal elem. | m2 (.) | ms? (.) | ms?(.) 
wai = 0, U5 3 4 . 
wee = 02 U O4 ; 3 . 


e At layer | = 3: We work with the two subframes 03; 4 
{we11, we12} and O39 4 {w21, W222} of 3 with the 


BBAs 
Focal elm m1(.) | mis.) | mZs*(.) 
Pa va 2 2 2 
Wo12 = O5 3 5 3 
Focal elem, m2.) | mfs2(.) | m222(.) 
Woai = ve) 5 0 5 
wo22 = 04 . 1 5 


Step 3: The connection weights \; are computed from the 
assignments of coarsening elements. Hence, we obtain the 
following connecting weights in the bintree: 


e At layer / = 1: 

Ay = 0.2226; Ag = 0.7774. 
e At layer | = 2: 

A3 = 0.2200; Aq = 0.7800. 
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e At layer | = 3: 
As = 0.5; Ag = 0.5; A7 = 0.0669; Ag = 0.9331. 


Step 4: We finally get the following resulting combined and 
normalized Bayesian BBA 


m®(-) = {0.0855, 0.0406, 0.2226, 0.5658, 0.0855}. 


B. Example 2 (with non-Bayesian BBAs) 


Example Ibis: Let’s consider 0 = {61, 02, 03, 04, 45}, and the 
following BBAs given by 


Focal elem. | m?(.) m(.)  m(.) 
0 0.1 0.4 0 
2 0.2 0 0 
03 0.3 0.05 0 
04 0.03 0.05 0 
65 0.1 0.04 0 
0, UO, 0.1 0.04 0 
6.U03;U 65 | 0 0.02 O01 
03 U 04 0.02 0.1 0.2 
0, UOs 0.1 0.3 0.2 
e) 0.05 0 0.5 


Step 1 (Pre-Processing): All these three BBAs are trans- 
formed into Bayesian BBAs with DSmP transform and the 
generated BBAs are illustrated as 


Focal elem. | m?(.) —m$(.) _ m9(.) 
A; 0.1908 0.7127 0.2000 
02 0.2804 0 0.1334 
03 0.3387 0.1111 0.2333 
04 0.0339 0.1 0.2000 
5 0.1562 0.0761 0.2333 


It can be verified in applying formula (9) that the disagree- 
ment vector D,_3 for this example is equal to 


D,_3 = (0.5385, 0.3632, 0.3453, 0.2305, 0.2827]. 


Step 2: According to the clustering algorithm, the elements 
of the frames (); are defined as follows: 


e At layer l 1: OQ; {wy 2 01, We 2 A U A3 U 04 U 65} 
e At layer 1 = 2: Qe {wai = A2 U 63, Woe 4 64U 5} 
e At layer l 3: Og {wei = Ag, W212 4 43, W221 4 


04, W222 = 05} 
Step 3: The BBAs of elements of the (sub-)frames () are 
obtained as follows: 


e At layer / = 1, we use (12)-(13) and we get 


Focal elem. | m?(.) | m$ (.) | m$" (.) 
wy = Oy 0.1908 | 0.7127 | 0.2000 
wo & 0.U63U64U 85 | 0.8092 | 0.2873 | 0.8000 


e At layer | = 2: We use again the proportional redistribu- 
tion method which gives us: 
Focal elem. | m2 (.) | m$?(.) | ms?(.) 
wo = 0) U 63 | 0.7651 | 0.3867 | 0.4584 
wo = 04U 65 | 0.2349 | 0.6133 | 0.5416 


e At layer |] = 3: We work with the two subframes Q3; 
{wei1, we12} and 032 4 {w21, W222} of Oz with the 


BBAs: 
Focal elem. | m{?5*(.) | m$1(.) | m$*(.) 
W114 202 | 0.4529 0 0.3638 
wo. £63 | 0.5471 1 0.6362 
Focal elem. | m{?2?(.) | m$s2(.) | m$?(.) 
wo, = 64 | 0.1783 | 0.5679 | 0.4616 
wo29 £65 | 0.8217 | 0.4321 | 0.5384 


Step 4: The connection weights ’; are computed from the 
assignments of coarsening elements. Hence, we obtain the 
following connecting weights in the bintree: 


e At layer / = 1: 


Ay = 0.2345; Ag = 0.7655. 
e At layer | = 2: 

A3 = 0.5533; Aq = 0.4467. 
e At layer | = 3: 

As = 0.1606; Ag = 0.8394; 

A7 = 0.8349; Ag = 0.6651. 


Step 5: We finally get the following resulting combined and 
normalized Bayesian BBA 


m®(-) = {0.2345, 0.0681, 0.3555, 0.1145, 0.2274}. 


V. SIMULATION RESULTS AND PERFORMANCES 
A. Flexible Grouping of Singletons 


1) Similarity: i Assuming that 0 = {61, O2, 3, 4, 95, 96, 
07, Os, A9, P10, O11, A412, 413, O14, 615} and first, we randomly 
generate 2 BBAs, denoted as m9@(-) and m9(-), which can 
be seen in Table I. 


Table I: BBAs for Two Sources m?(-) and m$(-) 


04 02 03 04 05 
m9 (-) 0.1331 0.0766 0.0175 0.0448 0.0229 
m9 (-) 0.1020 0.0497 0.1094 0.0612 0.0612 
06 07 6g 09 910 
my (-) 0.1142 0.0023 0.2254 0.1583 3.4959e-04 
ms (-) 0.0069 0.0070 0.0128 0.0833 0.0338 
O11 O12 013 O14 O15 
me (-) 0.0075 0.0514 0.1121 0.0314 0.0021 
ms (-) 0.1180 0.1202 0.1351 0.0686 0.0309 


In order to fully verify the similarity between hierarchical 
flexible coarsening method and PCR6 in DSmT, a new strict 


'4Similarity represents the approximate degree between fusion results using 
flexible coarsening and PCR6. 
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Figure 4: Structure of Hierarchical Flexible Coarsening. 


distance metric between two BBAs, denoted dz 7» was recently 
proposed in [20], [16] and it will be used in this paper. 

In this paper, we regard d%, as one criteria for evaluating 
the degree of similarity between the fusion results obtained 
from flexible coarsening and PCR6. 

Based on (8) and (10), the disagreement vector D(-) is 
obtained: 


D(-) = (0.0032, 0.0020, 0.0290, 0.0092, 0.0147, 0.0228, 
0.0059, 0.0537, 0.0154, 0.0131, 0.0338, 0.0235, 
0.0118, 0.0145, 0.0120). 


Thus, bintree structure of hierarchical flexible coarsening is 
illustrated in Fig. 4 and the similarity between fusion results of 
hierarchical flexible coarsening and PCR6 is 0.9783. And the 
similarity between hierarchical coarsening method and PCR6 
is 0.9120. In particular, terminal nodes (the red small box 
in Fig. 4) of flexible grouping are not in accordance with the 
original order 6;,42,--- , 015. This is quite different compared 
to original hierarchical coarsening method. 

From the point of view of statistics, 100 BBAs are randomly 
generated to be fused with three methods: hierarchical flexible 
coarsening, hierarchical coarsening and also PCR6. Compar- 
isons are made in Fig. 5, which show the superiority of our 
new approach proposed in this paper (Average value of new 
method is 97% and the old method is 93.5%). 


B. Flexible Grouping of Conflicting Focal Elements 


Assuming that there are five sources of evidence 
m9 (-),mP(-),m$(-), mP(-), mY(-), and the restricted hype- 
power set D° = {61, A2, 43, 04, 5, 96, 97, 98, 99, 910, 01 M 
02,051 06M 07,01 005 NA9N O10}. And then we randomly 
generate 1000 BBAs for each source to calculate the similarity 
using (10). From Fig. 6, we can find that hierarchical flexible 
coarsening method can also maintain high degree of similarity 
which performs better than hierarchical coarsening. 
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Similarity Comparison Between Hierarchical Flexible Coarsening (HFC) And Hierarchical Coarsening (HC). 
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Figure 6: Comparisons Between HFC and HC (Singletons and 
Conflicting Focal Elements). 


C. Flexible Grouping of Uncertain and Hybrid Focal Elements 


We can also deal with uncertain and hybrid focal el- 
ements. Assuming that there are also five sources of 
evidence m(-),m9(-),m$(-), mP(-),mQ(-) and D® 
{61, Ao, 3, 04, Os, 6, 07, Ag, A, 910, (cal U O2, Os U 6 U 07, A4 U 
05U 09 U O19}; DP = {01, 2,43, 94, 95,96, 07,8, 99, O10, 429 
04 U 06,01 U 03 M 05 U 07 M O9}'°. And then we respectively 
and randomly generate 1000 BBAs for these two cases D? 
and D®. Finally, we calculate the average similarity degree of 
HFC and HC with PCR6 in Table II, which illustrates HFC 
performs better than old method. However, there exist the extra 
time cost of HFC compared to HC due to the clustering steps 
in coarsening process. 


Table I: Similarity Comparisons 


Hierarchical Flexible Coarsening 


98% 
97% 


Hierarchical Coarsening 


91% 
93% 


DP 
DP 


VI. CONCLUSION AND PERSPECTIVES 


A novel hierarchical flexible approximate method in DSmT 
is proposed here. Compared to original hierarchical coarsen- 


'3In this case, De represents uncertain focal elements and DS represents 
hybrid focal elements. 
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ing, flexible strategy guarantees higher similarity with PCR6 
tules in fusion process. Besides, whether focal elements in 
hyper power set are singletons, conflicting focal elements, 
uncertain or even hybrid focal elements, the new method 
works well. In the future work, we will focus on the general 
framework of hierarchical coarsening, which could generate 
final non-Bayesian BBAs in order to avoid loss of informa- 
tion. Furthermore, other advantages or disadvantages of our 
proposed methods such as computational efficiency and time 
consumption need to be further investigated. 
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Abstract—The dynamical systems in various science and en- 
gineering problems are often governed by nonlinear equations 
(differential equations). Due to insufficiency and incompleteness 
of system information, the parameters in such equations may 
have uncertainty. Interval analysis serves as an efficient tool for 
handling uncertainties in terms of closed intervals. One of the 
major problem with interval analysis is handling ‘dependency 
problems” for computation of tightest range of solution enclosure 
or exact enclosure. Such dependency problems are often observed 
while dealing with complex nonlinear equations. In this regard, 
initially two test problems comprising of interval nonlinear 
equations are considered. The Set Inversion via Interval Analysis 
(SIVIA) along with Monte-Carlo approach is used to compute the 
exact enclosure of the test problems. Further, the efficiency of the 
proposed approach has also been verified for solving nonlinear 
differential equations (Van der Pol oscillator) subject to interval 
initial conditions. 


Keywords: uncertain nonlinear equations, nonlinear oscillator, 
dependency problem, SIVIA Monte-Carlo, contractor. 


I. INTRODUCTION 


Various vibration problems in science and engineering dis- 
ciplines viz. structural mechanics, control theory, seismology, 
physics, biology etc. may be expressed in terms of nonlinear 
equations, system of nonlinear equations and nonlinear differ- 
ential equations. Generally, the parameters in such equations 
deal with precise variables. But, the insufficiency and incom- 
pleteness of the system information often led to parameters or 
variables with imprecision or uncertainty. For instance, let us 
consider a nonlinear damped spring-mass system as given in 
Fig. | governed by the equation, 


mz + ct +az? + kx + Bx? = f(t), (1) 


where, ™m, c and k are respectively mass, damping and stiffness 
of the nonlinear system. Here, the external force applied on 
the system is f(t) with damping force fy = ct + ax? and 
spring force f, = ka + Bx. 
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Figure 1. Damped spring-mass system. 


The uncertainty of the material properties in Eq. (1) led to 
uncertain nonlinear differential equation. Such uncertainties 
may be modeled either using probabilistic approach, interval 
computation or fuzzy set theory. In case of non-availability of 
sufficient experimental data, probabilistic methods may not be 
able to deliver reliable results. Moreover, in fuzzy set theory a 
fuzzy number is expressed in terms of closed intervals through 
a-cut approach. As such, interval analysis have emerged as a 
powerful tool for various practical problems in handling the 
uncertainties. 

In early 1960s the pioneer concept related to interval com- 
putations, functions, matrices, integral and differential equa- 
tions has been started by R. E. Moore [12]-[14]. System of 
equations, algebraic eigenvalue problems, second order initial 
and boundary value problems has been discussed by Alefeld 
and Herzberger [3]. Guaranteed interval computations with 
respect to set approximations, parameter and state estimation 
with applications in robust control and robotics are addressed 
by Jaulin et al. [10]. While dealing with interval computations, 
one of the major obstacle is to handle the ‘dependency prob- 
lems’ effectively such that the tightest enclosure of solution 
bound may be obtained. Such dependency problems often 
occur in dealing with systems governed by complex nonlinear 
equations which often lead to over-estimation of solution 
bound. The dependency problem due to overestimation (wrap- 
ping effect) has been studied by Kramer [11] with respect 
to generalized interval arithmetic proposed by Hansen [9]. 
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Other approach for reduction of overestimation while handling 
dependency problem may be performed using contractors [10], 
affine arithmetic [15] and/or parametric forms. As such, the 
present work proceeds with the introduction section. The 
preliminaries of classical arithmetic of Interval Analysis (IA) 
along with its application for two complex nonlinear equations 
comprising of imprecise variables are considered in Section 
Il. The Set Inversion via Interval Analysis (SIVIA) along 
with Monte-Carlo approach is then used to compute the exact 
enclosure of the two test problems in Section III. Further, 
the proposed approach has also been verified for computing 
validated enclosure of nonlinear differential equations (Van der 
Pol oscillator) subject to interval initial conditions in Section 
IV. 


II. CLASSICAL INTERVAL COMPUTATIONS 


Interval analysis deals with interval computations on a set 
of closed intervals IIR of real line R, in order to obtain the 
tightest bound or enclosure for uncertain systems. A closed 
interval [2] C IR is denoted by [a] = [x,Z%] such that 


[x] = [2,2] = {t|a2<t<Z, where x, FER}. 


Here, x = inf[z] is the inifimum or lower bound of [x] and 
= sup[a] is the supremum or upper bound of []. The width 
and center of [x] may be referred as |x] = x — x and [x]° 
at respectively. 

Basic operations using classical interval arithmetic given in 
Moore et al. [14] are illustrated as follows: 


e Addition: 


[7] + yl=[z+y, +9], 
e Subtraction: 
[x] — ly] = |z-9, Fy], 
e Multiplication: 
[x] - [y] = [min{S. ([x], [y])}, max{S. ([x], [y])}], 
where S. ({z], [y]) = {zy, zy, Ty, TY}. 
e Division: 
x = [x, Z| : [3 1| , O ¢ [y, vl, 
[z}/[y] ey ea 
e Power: 


— If n> 0 is an odd number, then 


nan 
7 


iF 
— If n> 0 is an even number, then 


ees [x] > 0, 
[z", 2] , [x] < 0, 
(0, max{x",z"}], O€ [x]. 


[x 


n mn 
5x 


[x]" = 


Then, we have illustrated two test examples for the 
implementation of basic interval arithmetic in Examples 1 
and 2. 
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Example 1: Compute the bound [z,] satisfying constraint 
(2) 
such that U+%24+ %3 = 1 and Y¥1 + Yo t+ YB aa 


2 = %1Y1 + %1y3 + ©3y1- 


Here, 1 € [ai] = [0.2,0.3], a2 € [xe] = [0.1,0.2], 
yi € [yi] = (0.4, 0.6] and yo € [ya] = [0.2, 0.3]. 

Using classical IA, the bounds [x3] and [ys] are 
initially estimated as [x3] ~ 1 — [x] — [x2] = [0.5,0.7] and 


[y3] ~ 1 — [yi] — [y2] = [0.1, 0.4] respectively with respect to 
the constraints 7; +%2+23 = 1 and y; + yo + y3 = 1. Then, 
the bound [z1] is obtained as 


~ [x1] - [ya] + [x1] - [ys] + [x3] - [yi] = [0.30, 0.72]. (3) 


Further, we have considered a more complicated nonlinear 
constraint in Example 2, related to problems of multi-criteria 
decision-making under imprecise scores given in Dezert et al. 


[7]. 


[z]74 


Example 2: [7] Compute the bound [z2] satisfying con- 
straint given by the imprecise proportional conflict redistribu- 
tion (PCR) fusion rule 


yp re 
Y1 + Xo 


riya 
r+ Yo 
such that 7 € [0.2,0.3], 72 € [0.1,0.2], y: © [0.4,0.6] and 
yo € (0.2, 0.3]. 


(4) 


2g = 2+ 


Here, the bound of [z2] is obtained as 


~ [ealA4 [1] [ye] | (yl? feo] _ 
[zi] + [yo] [ys] + [x2] 


ics (0.3333, 0.9315]. 


(5) 
The enclosures obtained in Eqs. (3) and (5) have been com- 


pared with enclosures obtained using Monte-Carlo simulation 
in Table I. 


[z2 


Table I 
INTERVAL BOUNDS OF z1 AND 22. 


i Interval bounds 


[2/74 [zy] MC 
I [0.30, 0.72] (0.3850, 0.5935] 
2  [0.3333,0.9315] — [0.4617, 0.6825] 


Here, the Monte-Carlo simulation approach using uniformly 
distributed 100000 independent random sample values of 
variables x1, 22, y1 and yz have been considered, where x7; ~ 
U((xa]), v2 ~ U([xal), yx ~ U([ys]) and yo ~ U(lya). 
From Table I, it is worth mentioning that the bounds for 
i = 1,2 satisfy 
lug Cc i A 


[Zi iG [z; 


In case of more sample values, the Monte-Carlo simu- 
lation may yield better interval enclosure with respect to 
the constraints (2) and (4), but such approach is inefficient 
with respect to computational time. So, we may consider the 
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problem in handling interval computations as to interpret the 
tightest or the exact enclosure [z;] of z; that satisfies 


lal” clal cial, (6) 
such that 
inf [z;]/4 < inf [z,] < inf [z,]”°, 
fi [zJ@° < sup [zi] < sup [z;]/4. 7) 
a IA < y < yMC 
fete ® 


Although in the above computations, interval arithmetic 
looks simple for basic operations with intervals and seems 
appealing. But, the “dependency problem” is a major obstacle 
when complicated expressions have to be computed in order 
to find tightest enclosure. In this regard, the dependency effect 
has been discussed in detail in next section. 


A. Dependency Problem in IA 


Variable or parameter dependency problem in IA is gen- 
erally exhibited when we have more than one occurrence of 
imprecise parameter in the governing constraint. For instance, 
in case of the nonlinear constraint 


z=2"+y" for x € [0.1,0.5] and y € [—0.6, 0.1], 


the occurrence of each imprecise variable x and y is once. 
The computation of enclosure with respect to constraint z = 
x? + y? using classical IA results to [z]/4 = [0.01,0.61] 
which is found equivalent to the Monte-Carlo simulation of 
x ~ U((0.1,0.5]), y ~ U ({[-0.6,0.1]) for 100000 sample 
values yield [z]’“© = [0.01, 0.61]. But, the complexity occurs 
while dealing with complex nonlinear constraints as given in 
Examples 1 and 2, where the dependency effect is exhibited 
due to multiple occurrence on imprecise variables. 

The dependency effect may be reduced by replacing the 
constraint given in Eq. (2) with an equivalent simpler con- 
straint having less (or none) redundant variables. For instance, 
the equivalent constraint 


(9) 


21 = (1—ae)yot+23y1 


results to a better enclosure approximation [z,]/4 
(0.34, 0.66]. Here, the interval bound [0.34, 0.66] is contained 
in the bound [0.30, 0.72] obtained using the equivalent con- 
straint given in Eq. (2). But, on the other-hand an equivalent 
constraint 


zy = (1—ye)ai t+ (1—2e2)y1 — 21% (10) 


results to an overestimated bound [z,]/4 = [0.28,0.70]. 
Due to such dependency, the interval bounds often yield 
overestimation of the tightest enclosure. Similar, dependency 
effect is exhibited while computing [z2]/4 for constraints 


a=at(a + +) + + 


T1Y2 


TyY2 
rit+y2 


Yt @2 
Yit+x2 


and 2g = 2+ 


-1 
TEE + Z) with respect to (4). As such, identification of 
1 


constraint yielding tightest enclosure is cumbersome. In this 
regard, the problem formulation for reduction of dependency 
effect has been carried out in the next section. 
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1) Problem Formulation: The main aim in the present work 
is to compute tightest enclosure [z,, Z;] or exact enclosure such 
that [z;)]@° ~ [z,]/4 or 


IA _, _.,MC 
7 ie ee) (11) 
=MC _<zs. _<IA 
A Se Sees 
associated with some _ nonlinear constraint <z; = 
f(#1,%2,y1,Y2), where 2; € [x] and y € [xy] for 


a = 1,2. In this regard, SIVIA Monte-Carlo approach based 
on set inversion via interval computations and Monte-Carlo 
simulation have been proposed to estimate exact bounds in 
next section. 


Ill. SIVIA MONTE-CARLO APPROACH 


Initially, the general procedure of SIVIA has been incorpo- 
rated in Section HI-A followed by contractors in Section III-B. 
Finally, the combination of SIVIA with Monte-Carlo approach 
(i.e SIVIA-MC) has been performed in Section III-C. 


A. SIVIA 


Set inversion of a typical set X C R™ with respect to 
function f : R’ — R” is expressed as 


X= f(Y) ={« eR™ | f(z) € Y}, 


where, Y C R”. In case of SIVIA [10], an initial search 
set [zo] is assumed containing the required set X. Then, 
using sub-pavings as given in Fig. 2, the desired enclosure of 
solution set X is obtained based on the inclusion properties: 
1) Case I: [f|([x]) C Y = > [x] C X, then [x] is a 
solution, 
2) Case IT: [f]\([x]) NY =o => [x]NX = 4, then [x] 
is not a solution, 
3) Case IIT: [f|([x]) OY 4 ¢ and [f]([x]) Z Y then, [x] 
is an undetermined solution. 


> 
Case III 
Figure 2. Set inversion via interval analysis. 


The detailed illustration of set computation using SIVIA 
based on regular sub-pavings, bisections etc. may be found in 
[10]. The sub-pavings in SIVIA may be improved with the 
usage of contractors discussed in next section. 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


B. Contractor 
Contractor: ( [4], [10]) A contractor C associated with a 
set X Cc R” over domain D is an operator 


C : IR” > IR” 


satisfying the following properties: 

e Contraction: C([z]) C [a], V[a2] € IR”, 

e Completeness: C((z]) OX = [a] NX, V[2] € IR”. 
The pictorial representation of implementation of contractor 
over set X C R? is illustrated in Fig. 3 


YU 


Wi 


a 


Figure 3. Contraction of [2]. 


There exist various types of contractors viz. fixed-point, 
forward-backward, Newton, Gauss-Seidel contractors etc. 
Contractor based set inversion of leminscate curve (x?-+y?)?+ 
a?(a? — y) = 0 having width a € [2,3] has been obtained 
based on the PylIbex library [6] and depicted in Fig. 4, where 
the initial search set is [—4, 4] x [—4, 4]. 


Figure 4. SIVIA of leminscate curve with width [2, 3]. 


In order to perform SIVIA Monte-Carlo approach, we have 
used forward-backward and fixed-point contractors. Detailed 
implementation of forward-backward and fixed-point contrac- 
tors have been incorporated in Appendix. 
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C. SIVIA Monte-Carlo 


SIVIA Monte-Carlo (or SIVIA-MC) is two form iterative 
methodology that includes implementation of SIVIA using 
contractor programming and the Monte-Carlo simulation till 
the exact enclosure is obtained satisfying (11). In this regard, 
the iterative procedure is incorporated in Algorithm 1 with 
respect to constraint z = f(x,,22,...,2%p) such that each 
x; € [x] € IR for 7 = 1,2,...,n. Here, the initial search set 
containing the exact enclosure is assumed as [zo]. 


Algorithm 1: Implementation of SIVIA-MC approach 
Input: [a;| for 7 1,2,...,n; Initial domain 
X= [x1], [x2], aoa) [tn]; 

Initial search set [zo] 


Step 1: Compute enclosure using Monte-Carlo 
2ZMC = mel(X) and ZS = mcu(X) 
Step 2: Compute enclosure using contractors 
2A = Ctel(X, [z]) and Z““ = Cteu(X, [z0]) 
Step 3: Improve lower and upper range of z 
Ze [2tA, MC} and ee [EMC, 74] 
Step 4: Compute improved lower X = [f]~!([z™,zM°]) 
domains using SIVIA 


and 

upper X = [/] 
X, X=SIVIA(X, [A], [20], €) 

Step 5: Repeat steps 1 to 3 for domains X and X 

Step 6: Repeat step 4 for different domains X and X 

Step 7: Iterate steps 4 and 5 till z a 


-1 ( Een Z| ) 


~N 


Output: [z, Z] 


In Algorithm 1, mcl(-), mcu(-) are functions that compute 
the minimum and maximum function value with respect to 
domain X. Then, C'tcl(-), Ctcu(-) uses forward-backward con- 
tractor along with fixed-point contractor for computing interval 
enclosure based on classical IA. Further, SIVIA(-) computes 
the set inversion for domain X based on constraint function f 
with precision e€. 

Let us again consider the Examples | and 2 in order to 
compute the exact enclosure using SIVIA Monte-Carlo in 
Example 3. 


Example 3: Compute the interval bounds for the constraints 


yj re 
Yi + XQ 


rT Ye 
2 = 21Yy1 + 21y3 + ©3y1 and 2 = 24 
T+ Y2 


using SIVIA Monte-Carlo such that x; + 2 + 23 = 1 and 
Y1 + Y2+ 93 = iD. Again, GE [x1] = (0.2, 0.3], TIE [xo] = 
(0.1, 0.2], 1 € [yi] = [0.4,0.6] and yo € [ye] = [0.2, 0.3]. 
Using Algorithm 1 for SIVIA precision € = 0.001 and 
different sample values viz. 100000, 1000, 100, 10, the tightest 
enclosures with respect to constraints Zy = @Y1 +2193 +73Y1 
and z2 = 21 + ES + ee for different sample values are 
obtained and incorporated in Tables II and III respectively. 
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Table IT 
INTERVAL ENCLOSURE OF 21 


SIVIA (0.001 precision) and Monte Carlo samples 


Iterations 100000 samples 1000 samples 
Z€ ZL € A € ZL € 
il [0.3796, 0.385] |0.5935, 0.6007] = [0.3796, 0.3850] = [0.5935, 0.6007] 
2 [0.3796, 0.3807]  [0.5993,0.6007] [0.3796, 0.3822] —[0.5971, 0.6007] 
3 (0.3796, 0.3801] (0.5999, 0.6006] = [0.3796, 0.3808] — [0.5987, 0.6008] 
4 — _— [0.3797, 0.3803]  [0.5995, 0.6007] 
[Z1| [0.38, 0.6] (0.38, 0.6] 
Time (s) 5.1388 5.9936 
3 100 samples 10 samples 
Iterations Ze He Ze He 
1 [0.3796, 0.385] |0.5935, 0.6007] |0.3796, 0.385] |0.5935, 0.6007] 
2 (0.3796, 0.3833] — [0.5965, 0.6007] (0.3796, 0.384] (0.5945, 0.6007] 
3 [0.3797, 0.3817]  [0.5982,0.6007] [0.3796, 0.3836] — [0.595, 0.6007] 
4 [0.3797, 0.3814] [0.5989,0.6007] [0.3797,0.3829]  [0.5971, 0.6007] 
5 [0.3797, 0.3808] [0.5995,0.6006] = [0.3797,0.3811] —_ [0.5977, 0.6007] 
6 [0.3797, 0.3805] [0.5996, 0.6006] = [0.3797, 0.3808] — [0.5978, 0.6006] 
7 — —_— [0.3797, 0.3805] — [0.5988, 0.6006] 
[21| [0.38, 0.6] (0.38, 0.6] 
Time (s) 6.0616 9.511 
Table II 


INTERVAL ENCLOSURE OF 22 


SIVIA (0.001 precision) and Monte Carlo samples 


Iterations 100000 samples 1000 samples 100 samples 10 samples 
ZQ€ 22€ zo €& 22 € Zo € Z2€ Za € aS 
1 [0.4565, 0.4617] (0.6825, 0.6889] —_ [0.4565,0.4617] [0.6825,0.6889] — [0.4565, 0.4617] —_[0.6825, 0.6889] [0.3796, 0.385] 0.5935, 0.6007] 
2 [0.4565, 0.4581]  [0.6869, 0.6887] [0.4565, 0.4591] —[0.6859, 0.6887] — [0.4565, 0.4609] — (0.6826, 0.6887] — [0.4565, 0.4609] 0.6854, 0.6887] 
3 [0.4566,0.4575] — [0.6872, 0.6886]  [0.4565, 0.4579] —_[0.6864, 0.6887] — [0.4565, 0.4599] — (0.6858, 0.6887] [0.4565, 0.4601] = [0.6828, 0.6886] 
4 — —_— [0.4565, 0.4577]  [0.6867, 0.6887] — [0.4566, 0.4578] [0.6863, 0.6885] — [0.4565, 0.4589] 0.6836, 0.6886] 
Pi) — — [0.4565, 0.4575]  [0.6869, 0.6887] — [0.4566, 0.4577] — [0.6864, 0.6891] = [0.4565, 0.4584] 0.6839, 0.6885] 
6 —_— — [0.4565, 0.4574] [0.687, 0.6889] [0.4566, 0.4576] [0.6866, 0.689] [0.4565, 0.4581] (0.685, 0.6885] 
7 _— _— _— _— [0.4566, 0.4574] [0.6867, 0.689] [0.4565, 0.4577] 0.6856, 0.6885] 
8 _ _ _ _ _ _ [0.4565, 0.4575] [0.686, 0.6885] 
9 — — _ _ — — [0.4566, 0.4574] 0.6865, 0.6884] 
[z2] [0.46,0.69] [0.46,0.69] [0.46,0.69] [0.46,0.69] 
Time (s) 6.7797 7.5464 9.2454 24.847 


It may be observed from Table II that the SIVIA Monte- 
Carlo method iteratively converge to exact enclosure [0.38, 0.6] 
(up to two decimals) even for less sample values viz. 100 
and 10 respectively. Also, it may be noted that the iterative 
enclosures converge to exact bound though the computational 
time increases from 5.1388 to 9.511 seconds for different 
samples ranging from 100000 to 10 respectively. From Table 
II, the proposed method seems appealing as even for less 
sample values the convergent or exact solution bound is 
achieved. Many practical application problems do not yield 
sufficient data and sometimes availability of large data are cost 
effective, in such cases the proposed method may be used to 
obtain exact enclosure and the increase in computational time 
may be neglected. 


Similar observations of exact enclosure convergence may 
be found in Table II with respect to different sample values. 
Moreover, due to complexity of the constraint (4), the required 
computational time 24.847 seconds for [z2] is comparatively 
higher than time 9.511 seconds required for [z,]. Further, 
a nonlinear differential equations with respect to dynamic 
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problems has been considered in next section for verification 
and effectiveness of SIVIA Monte-Carlo approach. 


IV. NONLINEAR OSCILLATOR 


Sometimes, dynamic problems are governed by m+ cz + 
kx = f(t) having nonlinear stiffness (kyx-+kox?+...) which 
result to nonlinear differential equations (nonlinear oscilla- 
tors). In case of uncertain nonlinear oscillators, the SIVIA 
Monte-Carlo method has been implemented using nonlinear 
equations obtained based on Runge-Kutta At order [5], [8]. 
As such, the enclosure obtained in present section yield a 
validated enclosure rather than the tighest bound. There exists 
several validated interval methods and solvers viz. DynIbex 
[16] and CAPD [1] libraries for obtaining validated bounds. 


Example 4: Consider Van der Pol equation (crisp or precise 


case given in Akbari et al. [2]), 
&(t) + 0.15 (1-27) 4+ 1.442 =0, (12) 


subject to uncertain initial conditions x(0) € [0.1,0.3] and 
#(0) = 0. 
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The system of first-order differential equation corresponding 
to (12) is obtained as 


“@=v=fy(t,u,v), 0 =0.15(u?—1)v—-1.44u = f(t, u, v), 


subject to initial conditions u(0) € [0.1, 0.3] and v(0) = 0. Us- 
ing Runge-Kutta fourth-order (RK4), the nonlinear constraints 
involved in computation of (12) are 

(ky + 2kg + 2k3 + ka) : 


h 
Unt+t1 = Un oars (13) 


6 
h 

Until =Un + 6 (Ly + 2lo + 2ls3 + 14) ; (14) 

where, 


ky = Peas brig Una ses) ly = Fete) Gey Ue Un) 


_ h kk 
ko =hfu (1+ Stn T gun T 3), 
7 h ky l, 
lo =hfy (1+ Sot + qn t 3), 
_ h _ ke _ le 
kg =hfu (14+ Som T gun T 2), 

h 


? 


kg ly 
4 (t+ gun + 9 »Un + 9 
ka = hfu (tn sl h, un + k3,Un +13), 
ly = hfy (tn +h, Un + ks, Un +13). 


Using Algorithm 1 with respect to constraints (13) and 
(14), the validated enclosure of x(t)|:2r is obtained and 
incorporated in Table IV and Fig. 5. 


Table IV 
INSTANTANEOUS SOLUTION ENCLOSURE OF 2(t)|1=7- 


Enclosures 
[ol(T) = [u(T) [vl (7) 
0.1 [0.0992,0.2978] [-0.0428,-0.0143] 
0.2 [0.0971,0.2917] [-0.0843,-0.0282] 
0.3 [0.0898,0.2828] [-0.1242,-0.0414] 


Figure 5. Enclosure of x(t) for t € [0, 1]. 
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V. CONCLUSION 


Generally, dynamical systems occurring in various science 
and engineering problems are governed by nonlinear equations 
or nonlinear differential equations. An iterative procedure 
based on set inversion via interval analysis and Monte-Carlo 
method has been proposed for computation of exact enclosure 
of nonlinear equations having imprecise or uncertain variables. 
The effectiveness of SIVIA Monte-Carlo method has also 
been verified based on the considered test problems that yield 
exact enclosures even with respect to very less sample values. 
So, the method may be well implemented in computation of 
exact enclosures of various nonlinear equations irrespective of 
the dependency problem. Further, the method has also been 
implemented to compute validated enclosure in case of Van der 
Pol oscillator. Accordingly, the method may be applied to other 
practical nonlinear system of equations involving uncertain 
parameters. 


APPENDIX 


Forward-backward contractor: The forward-backward 
contractor is based on constraint f(a) = 0 where x € [a] and 
[x] € IR” which is illustrated using an example problem. 


Example A1: Perform forward-backward contractor subject 
to constraint w = 2u + v where, [w] = [8, 20], [u] = [—10, 5] 
and [v] = [0, 4]. 

Here, the constraint w = 2u + vu may be expressed in 
terms of function f as f(u,v,w) = w — 2u — v. Further, the 
possible different forms of the constraint may be written are: 


w—v 
2 


v=w-—2u, 


’ 


w=2utv. 


The forward-backward steps are then followed with respect to 
classical interval computations mentioned in Section II as: 


Je (A) = [-10,5]N ( 3, a [0, ‘) 


[v] N ([w] — 2[u]) = [0, 4] 9 ([8, 20] — 2[-0.5, 5]) = [0, 4], 


[—0.5, 5], 


fw] M (2[u] + [v]) = [8, 20] N (2[—0.5, 5] + [0, 4]) = [8, 14]. 


As such, the new interval bounds are [z] 
[2] = [-0.5, 5] and [y] = [0, 4) 


[3, 14], 


Fixed-point contractor: A fixed-point contraction associ- 
ated with ~ is implemented with respect to the constraint 
f(z) = 0 as x = W(x), where x € [a] € IR”. The fixed- 
point contractor with respect to constraint u? + 2u+1 = 0 is 
performed as 


u € [u] and u=¢(u) => we [u] and u € y([u)), 
= we [ul []((u). 
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In case of implementation of forward-backward contractor 
along with fixed point contractor helps in computation of 
forward-backward contractor until the fixed interval is reached. 
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Abstract—The classifier based on rough sets is widely used in 
pattern recognition. However, in the implementation of rough set- 
based classifiers, there always exist the problems of uncertainty. 
Generally, information decision table in Rough Set Theory 
(RST) always contains many attributes, and the classification 
performance of each attribute is different. It is necessary to 
determine which attribute needs to be used according to the 
specific problem. In RST, such problem is regarded as attribute 
reduction problems which aims to select proper candidates. 
Therefore, the uncertainty problem occurs for the classification 
caused by the choice of attributes. In addition, the voting 
strategy is usually adopted to determine the category of target 
concept in the final decision making. However, some classes of 
targets cannot be determined when multiple categories cannot 
be easily distinguished (for example, the number of votes of 
different classes is the same). Thus, the uncertainty occurs for 
the classification caused by the choice of classes. In this paper, we 
use the theory of belief functions to solve two above mentioned 
uncertainties in rough set classification and rough set classifier 
based on Dezert-Smarandache Theory (DSmT) is proposed. It 
can be experimentally verified that our proposed approach can 
deal efficiently with the uncertainty in rough set classifiers. 


Keywords: Classification, rough set, uncertainty, evidence 
reasoning, DSmT, belief functions.. 


I. INTRODUCTION 


a) Motivation: In recent years, we have witnessed the 
rapid development of Rough Set Theory (RST) [1]. There are 
many practical applications of this theory [2], [3], [4], [5]. 
Among these, Rough Set Classifier (RSC) has been widely 
used in the real classification problems [6], [7], [8], [9]. 

b) Challenges: However, in the practical use of RSC, 
there always exists uncertainty. In the literature [10] and [11], 
the discussions of the uncertainty in RST mainly focus on 
the following points of view: Chen [10] proposed several 
uncertainty measures of neighborhood granules, which are 
neighborhood accuracy, information quantity, neighborhood 
entropy and information granularity in the neighborhood RST; 
Zheng [11] estimated the uncertainty of rough set originated 
from two parts of boundary region. Although the uncertainties 
discussed in the above literature are of certain significance, 
however, the uncertainties discussed in this paper are shown 
in two aspects: 

1) The choice of attributes: for example, in the decision 
information table, some attributes are not significant in a 
representation and deleting of these attributes has no real 
impact on the classification results. However, such concept 
of significancy is relative, for different problems, the role of 
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each attribute is quite different. Thus, the problems of attribute 
selection are always ad hoc and depending on the user’s 
preference. Obviously, different attribute selections correspond 
to different strategies, which generally yield different results. 
For example, in [12], authors attempted to select the most 
information-rich attributes from a dataset by incorporating a 
controlled degree of misclassification into approximations of 
rough sets. Gao et.al [13] proposed a new uncertainty measure, 
named maximum decision entropy, for attribute reduction in 
the decision-theoretic rough set model. Although many robust 
and efficient reduction algorithms have been proposed, most of 
them concentrate on the properties of data or user preference 
in the definition of attribute reduction, which result in the 
difficulties of choosing appropriate attribute reductions for 
specific applications. For the same data, different users can 
define different reductions and obtain their interested results 
according to their applications. Jia et.al [14] reviewed nearly 
twenty two different attribute reduction methods, but to design 
of a robust attribute reduction method is not the focus of this 
paper. We emphasize the uncertainty caused by the choice 
of attributes, which is not discussed in details in the recent 
development of RST. For this aim, one typically seeks a policy 
for avoiding choosing attributes, and we propose to emphasize 
the importance of each attribute for the specific problems. 

2) The choice of classes: besides, in RST, the category of 
target concept is determined according to the element compo- 
sition of its corresponding approximate set: if the number of 
elements belonging to one class is the largest, the concept of 
target is labelled as this class. However, this kind of voting 
method often leads to uncertainty in making decisions, which 
affects the final precision of RSC. In order to illustrate this 
problem more vividly, we explain it through Figure 1: in case 
one, the approximate set of target concept (red five-poited 
star) has four elements (plus) belonging to class 1, three 
elements (plus) belonging to class 2 and two elements (plus) 
belonging to class 3. Thus, in case one, we can easily draw 
the conclusion that the target belongs to class 1. However, 
in case two or case three, the target cannot be labelled with 
single category because there are some classes (class 1 and 
class 2 in case two, class 1, class 2 and class 3 in case three) 
that have the same number of votes. More specifically, if 
the approximate set of such target is empty-set (case four), 
which category should be allocated to the target concept? 
As aforementioned, for the RSCs, there are two mentioned 
neglected uncertainty issues. The theory of belief functions 
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[15] is widely used in uncertainty management and uncertainty 
reasoning for decision-making. In this paper, we attempt to use 
it to model and manage the uncertainty incorporated in RSCs. 


Class 1 Class 2 Class 1 Class 2 


© ,.8@ @.@ 


Class 3 


| 
Case two : 


1 
er 


a 


= 1 
! Case three | 
4 4 


— H 
Case four ; 


i] 
‘ 


Figure 1: Uncertainty in Voting Strategy. 


c) Contributions: Because a certain attribute does not 
have the ability to distinguish items on a particular problem, 
but there may be a discriminative performance on another 
problem. Thus, according to the classification performance of 
each attribute, the corresponding weights of all attributes in 
information decision table are calculated, which are used as the 
evaluation index of the importance of an attribute. At the same 
time, we do not directly delete unimportant attributes which 
the classical reduction algorithms have done. We just consider 
all the attributes in the final classification, after all, we consider 
that all existing attributes must play a role in the decision. 
For the uncertainty of the voting strategy in traditional RSC, 
we have no statistics of the number of votes of each class 
in approximate sets. Instead, we first calculate the coordinate 
of each class with respect to each attribute and then get the 
distance between the target concept and each class in every 
attribute, in order to calculate the Basic Belief Assignment 
(BBA) of the target in each attribute. Then, we use the classical 
combination rule (PCRS is used in this paper) proposed in 
DSmT [16] to sequentially! combine all BBAs (each attribute 
has a corresponding BBA). Finally, according to the principle 
of maximum belief mass, we can obtain the final class of the 
target concept. 

This paper is organized as follows. Section II reviews some 
basic concepts of Dempster-Shafer Theory (DST), and DSmT. 
The new rough set classifier based on DSmT (RSCD) is 
proposed in section III. Section IV gives the summary of the 


'Because PCR5 rule is not associative, which means that the fusion results 
depend on the order you have chosen. Here, our default way of combination 
is to combine BBAs in order from small to large. For example, if there are 
three BBAs: m1, m2, ms3, the way of fusion is my = PCR5(m1,m2) > 
mi23 = PCRS(mi2,m3) 4 Mfusion = ™123- 
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proposed classifier. In section V, we give some experimental 
results to show the performances of our new method. Also, 
some meaningful discussions about the extension of RSCD 
are given in section VI. Section VII concludes the paper with 
a summary and direction for future. 


II. PRELIMINARIES 


This section provides a brief reminder of the basics of 
DST and DSmT, which is necessary for the presentation and 
understanding of the more general fusion of evidence. 

In DST framework, the Frame of Discernment (FoD)* 
© = {61,...,On} (n > 2) is a set of exhaustive and exclusive 
elements (hypotheses) which represent the possible solutions 
of the problem under consideration and thus Shafer’s model 
assumes 6; 6; = @ for i # j in {1,...,n}. A BBA 
m(-) is defined by the mapping: 2° +> [0,1], verifying 
m () = Oand ¥) 4296 m (A) = 1. In DSmT, one can abandon 
Shafer’s model (if Shafer’s model doesn’t fit with the prob- 
lem) and refute the principle of the third excluded middle. 
The third excluded middle principle assumes the existence 
of the complement for any elements/propositions belonging 
to the power set 2°. Instead of defining the BBAs on the 
power set 2° = (0,U) of the FoD, the BBAs are defined 
on the so-called hyper-power set (or Dedekind’s lattice) de- 
noted D©® = (O,U,M) whose cardinalities follows Dedekind’s 
numbers sequence, see [17], Vol.1 for details and examples. 
A (generalized) BBA, called a mass function, m(-) is de- 
fined by the mapping: D° +> [0,1], verifying m(0) = 0 
and }) 4epe m(A) = 1. The DSmT framework encompasses 
DST framework because 2° c D®. In DSmT, we can take 
into account also a set of integrity constraints on the FoD 
(if known), by specifying all the pairs of elements which 
are really disjoint. Stated otherwise, Shafer’s model is a 
specific DSm model where all elements are deemed to be 
disjoint. A ¢€ D® is called a focal element of m(.) if 
m(A) > 0. A BBA is called a Bayesian BBA if all of its 
focal elements are singletons and Shafer’s model is assumed, 
otherwise it is called non-Bayesian [18]. A full ignorance 
source is represented by the vacuous BBA m,(Q) = 1. The 
belief (or credibility) and plausibility functions are respectively 
defined by Bel(X) = Drepelvex m(Y) and PI(X) = 
Lyepeynxge MY). BI(X) 4 [Bel(X), Pl(X)| is called 
the belief interval of X. Its length U(X) = Pl(X) — Bel(X) 
measures the degree of uncertainty of X. 

In 1976, Shafer did propose Dempster’s rule and we use 
DS index to refer to Dempster-Shafer’s rule (DS rule) because 
Shafer did really promote Dempster’s rule in in his milestone 
book [18]) to combine BBAs in DST framework. DS rule 
for combining two distinct sources of evidence characterized 
by BBAs mj4(-) and ma(-) is defined by mpg (0) = 0 and 
VA € 2® \ {0}: 


> B,Ce2°|BNC=A m1(B)m2(C) 


mps(A) = 1— 3B .ce2°|Bnc=9 ™(B)me(C)” 


(1) 


A 


?Here, we use the symbol = to mean equals by definition. 
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The DS rule formula is commutative and associative and 
can be easily extended to the fusion of S > 2 BBAs. Un- 
fortunately, DS rule has been highly disputed during the 
last decades by many authors because of its counter-intuitive 
behavior in high or even low conflict situations, and that is 
why many rules of combination were proposed in literature to 
combine BBAs [19]. To palliate DS rule drawbacks, the very 
interesting PCRS was proposed in DSmT and it is usually 
adopted? in recent applications of DSmT. The fusion of two 
BBAs mj(.) and mo(.) by the PCRS rule is obtained by 
mpcrs(9) = 0 and VA € D® \ {@} 


mpcors(A) = m42(A)+ 
mi(A)m(B) 
m(A) + mo(B) © 


m3(A)ma(B) 


m(A) +mi(B) J’ 
(2) 


where m12(A) = 0p cepepnc=a ™1(B)m2(C) is the 
conjunctive operator, and each element A and B are expressed 
in their disjunctive normal form. If the denominator involved 
in the fraction is zero, then this fraction is discarded. The 
general PCRS formula for combining more than two BBAs 
altogether is given in [17], Vol. 3. We adopt the generic nota- 
tion MPC (.) = PCR5(mi(.),me(.)) to denote the fusion 
of m4(.) and me(.) by PCR5 rule. PCRS is not associative 
and PCRS rule can also be applied in DST framework (with 
Shafer’s model of FoD) by replacing D® by 2° in Eq (2). 


BED®\{A}|ANB=6 | 


III. NEW ROUGH SET CLASSIFIER BASED ON DSMT 
(RSCD) 


A. Weights of each attribute 


RST is a mathematical tool to deal with vagueness and 
uncertainty [1], which can effectively analyse the incomplete 
information and does not need additional data beyond the prior 
information. Next, we briefly give several relevant definitions 
to show how to calculate the weights of attributes: 

Definition 1: An information decision system S is S = 
(U,A,D), where U = {21,2%2,--: , an} is non-empty finite 
set of samples, A = {a1, a2,--- ,@m} is a non-empty finite set 
of attributes, D is a non-empty set of finite decision classes. 

Definition 2: Each attribute a € A defines an information 
function f, : U — Va, and V, is the set value of the attribute 
a. We further extend these notations for a set of attributes 
BC A, an indiscernibility relation Ind(B) can be defined as 
follows: 


Ind(B) = {(2j,2;) € U? | fila) = f;(a),Vae B , (3) 


where x; and x; are indiscernible when (x;,x;) € Ind(B). 
Some equivalence classes or elementary sets are generated by 
Ind(B). The elementary set of x; is represented by [xj] p. 
Any finite union of elementary sets is called a B-definable 
set. For pattern classification, elements have the same class 


3Recently, a new combination rule PCR6 was proposed to combine all the 
BBAs altogether in a single fusion step, which can be found in [20]. Because 
PCR6 rule coincides with PCRS when combining only two BBAs [17], we 
just use PCRS rule to combine BBAs in this paper. 
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label consisting of a concept X so that X € U/D, where 
U/D = {[xi]p | xi € U} and [x;] p represents the elementary 
sets of x; with respect to decision attribute D. Sometimes X C 
U is not B-definable. In other words, there exists elements 
that are in the same elementary set, but have different class 
labels, so that X becomes a vague concept. For this, we give 
the following definitions of approximation sets of such vague 
concept: 

Definition 3: The B-upper approximation BX and the B- 
lower approximation BX of the vague concept X is defined 
as follows: 


(4) 
(5) 


BX C BX, and BX consists of elements that certainly 
belong to X, whereas PB consists of elements that possibly 
belong to X. The set BNp(X) = BX — BX is called the B- 
boundary region of X, and thus consists of those objects that 
we cannot decisively classify into X on the basis of knowledge 
in B. 

Definition 4: POS B(D) is a positive region of the partition 
U/D with respect to B and is defined as follows: 


BX = {2 €U | [zi]_ CX}, 
BX ={x;,«€U | [ti] p AX AD}. 


POS;(D)= (J BX (6) 


X€EU/D 
=(J{V |¥ C X,Y CU/B,X €U/D}. 


Definition 5: The degree of support of the condition at- 
tributes B with respect to the decision attribute D is defined 


as follows: 
cp _ |POSB(D)) 
[U| 


Here, ¢ is regarded as the degree of importance of each 
attribute in the information decision table S. In order to 
illustrate how to calculate the weight of a particular attribute 
based on the aforementioned five definitions, we give a simple 
example below: 


(8) 


Example 1: Table I is an information decision table 
with U {x1,%2,°°° X12}, A {a1, a2, a3, a4}, 
D = {d; =1,d_ = 2,d3 = 3}. According to the decision 
attribute d and Eq.(3), if x; is set to U and B is 
equal to d, we can get the [xi], = [U], = U/D 
{{ £15 £4, 07; Le, £12} ;{ £2; €3, Vo, B10, Firs (eho LEH 
Meanwhile, we can also partition U by using each attribute 
Aj, t 1,---,m based on the indiscernibility relation 
Ind(B), which are illustrated in Table II. 

Thus, each element X in [U] , can be approximated by each 
condition attribute a;,2 = 1,--- ,m, and then we can obtain 
a;,X in Table III according to Definition 3. Based on Eq.(7), 
we can get the positive domain of D with respect to each 
attribute a;, which is also given in Table IV. 

In order to explain how positive domains are calculated in 
detail, we take POS,,(D) as an example: U/D = [U]p = 
{{£1; ©; 27,08; 012} , (£2, 2%3;X9; Lio, Fi} , {25326} }s 


U/ay = [U],, = {{21, v4}, {72}, {3a}, {es}, {te} {zz}, 
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Table I: Information decision table. 


U al a2 a3 a4 a5 
L1 5.1 | 3.5 | 14 ) 0.2 1 
rQ 6.6 | 2.9 | 46 | 1.3 2 
x3 5.2 | 2.7 | 3.9 ) 14 2 
LA 5.1 | 3.8 |] 1.5 | 0.3 1 
5 6.4 | 2.7 | 5.3) 19 3 
rE 6.8 | 3.0 | 5.5 | 2.1 3 
x7 5.5 | 42 | 14 ) 0.2 1 
xg 5.0 | 3.3 | 14 ) 0.2 1 
x9 5.0 | 2.0 | 3.5 | 1.0 2 
tio | 5.9 | 3.0 | 4.2 | 15 2 
yi | 3.7 | 2.6 | 35 | 1.0 2 
x12 | 46 | 3.6 | 1.0 | 0.2 1 


Table II: Results of partitioning the domain U using each 
attribute. 


The partitioning domain 
{{x1, ca}, {x2}, {x3}, {x5}, {xo}, {x7} 
{xs,r9}, {x10} , {vir} , {212} 
{{x1} , {x2} , {x3, 5 } , {xa} ’ {x6, r10} 
{x7}, {vs}, {ro}, {xii}, {vi2}} 
{{z1,27,t8}, {x2}, {x3}, {xa}, {x5} 
{xo} {z9, r11} , {r10} , {v12}} 
{{x1, 27,8, r12}, {v2}, {v3}, {xa}, {x5} 


re}, {xo, 211}, {x10} 


Table III: The lower approximation of elements in [U],, using 
each attribute. 


BX 
a1 {@1, 4,7, 1g, £12} 
aj4{©2,£3,%9,%10,L11F 
ai{xs, x6} 
a2{@1,€4, £7, £3,012} 
ag{r2,£3,X9,110, £11} 
a2{r5, te} 
a3{@1,€4, 07,08, £12} 
a3{@2,£3,X9, 110, £11} 
a3{x5, x6} 
a4{©1, 4,07, £8, C12} 
a4{%2,#3,%9,€10, P11} 
a4{x5, te} 


The B-lower approximation 
{{x1, ca}, {z7}, {z12}} 
{{x2}, {za}, {x10}, {zi FF 
{{x5}, {xe }} 

Ui}, {a}, Taz}, {aa}, Triaht 
{{x2}, en {xii}} 


{{x1, 27, x8}, {ra}, {z12}} 
{{z2}, {x3}, {x9, x11}, {x10}} 
{{x5}, {xe }} 

{{71, 07,28, L12}, {rah} 
{{x2}, {a3}, (wo, v1}, {x10} F 
{{x5}, {xe }} 


Table IV: The positive domain of [U], with respect to each 
attribute and weights of each attribute according to Eq.(8). 


Attribute Domain ¢ 
POSa,(D) {@1,©2,%3,%4, 25, 10 
‘i 6, €7,£10, 11,12} 12 
POSaz(D) {21, 2,04, 27, 10 
ics &g,X9, 11,12} 12 
©1,02,13, 4,05, 6, 10 

POSa,(D) | #1) 10 
a?) L7,%8,£9,%19,T11, C12} | 12 

©1, 02, 13,4, 05,76, 10 

POSa,(D) | 1 as 
aa(D) ©7,%8,09,10,011,t12} | 12 


{xg,r9},{x10}, {211}, {v12}}, for any elements Y , where 
Y € U/ai, if Y meets the condition: Y C X, where 
X € U/D, then Y belongs to the domain POS,,(D), for 
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example, when Y = {21,24} and X = {21, 4,27, 28,212}, 
it satisfies Y C X, so {x1,x4} belongs to POS,,(D). 
However, if Y = {xg, x9}, Y is not a subset of any elements 
in U/D, so {xg,x9} does not belong to POS,,(D). Thus, 
according to Eq.(8), we can obtain the degree of support 
of a; with respect to the decision attribute D in Table IV, 
which will be regarded as the weights of each attribute in the 
classification problem. 


B. Construction of BBA of Target Concept 


As discussed in the introduction section, the traditional 
way of voting decision will cause uncertainty when using 
RSC, and directly affect the final classification accuracy. The 
evidence theory has a good ability to deal with the uncer- 
tainty problem, and evidence theory generally describes such 
concept of uncertainty through BBAs. However, the BBAs 
in evidence theory are always given by experts depending 
on their own experience, which cannot be obtained directly 
in practical problems. Thus, this requires that, when solving 
such problems, the corresponding BBAs are first constructed 
and calculated before using them to make decisions. Referring 
to the construction methods of BBAs in [21], [22], [23], we 
propose in this paper a new construction method for the BBA 
based on so-called attribute polygon in RST. Each polygon 
represents an attribute and each vertice in a polygon represents 
one category. That is to say, if it is a two-classification 
problem, the attribute polygon is the line segment; Similarly, 
if it is the three-classification problem, such polygon is the 
triangle, and so on. Figure 2 illustrates the corresponding 
four polygons which represent for two, three, four and five 
classification problems. 


Two Classification Three Classification 


Four Classification Five Classification 


Figure 2: Attribute polygon. Pentagram represents the test 
example and for example, in three classification, the distances 
(dotted line) are calculated between the value of one attribute 
of pentagram and the vertices of one attribute triangle. 


Besides, the coordinates of all vertices in all 
attribute polygons are calculated according to [U], 


{{©1, ©4, £7, 2g, 212} > {€2,%3, 9,219,011} > {x5,x6}}. 
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Then, the Euclidean distance is used to calculate the distance 
between test example and each attribute polygon. Finally, we 
can get the belief mass value of this example belonging to 
each class with respect to one attribute by using Eq.(9) and 
Eq.(10). 


sf (9) 
(10) 


where a, ys and § are turning parameters and according to 
the recommendations given in [24], these parameters are set 
to a = 0.95, y, = —2 and @ = 1. Besides, d is the distance 
between the vertices of a; attribute polygon and each attribute 
value of text example x*. Next, we will show how to calculate 
BBAs through Example 1. 


Example 1 revisited: 


According to the decision attribute d in Table I, we 
know that this simple example is a three-class problem 
because D = {di,d2,d3}, so we need to construct the 
triangles. Because the decision table has four condition 
attributes, we need to construct four triangles. In order 
to show how to calculate the coordinates of vertices in 
each attribute triangle, we give the calculation steps as 
follows: Based on the partitions of the decision attribute 
d: {{£1, £4, £7, £8, C12}, 1235 033.Fo; £16 sL11} $125; LE hy 
we can obtain the coordinates of each category with respect 
to attribute aj: 


e the coordinate of class one with respect to aj: 


> f(x,a1) = 5.06, 


xe X (a1) 


[X(a1)| é 1) 
where X (a1) = {#1,%4,%7,2g, X12} and |-| denotes the 
cardinality; 

e the coordinate of class two with respect to a1: 


SS” f(w,a1) = 5.6800, 


EX (a1) 


Kal 1)| 


where X (a1) = {xo, %3,%9, 710, Liihs 
e the coordinate of class three with respect to ay: 


Dy f(a, a1) = 6.6000, 


xe X (a1) 


al 1)| 


where X (a1) = {25, x6}. 
Here, f(x;,a;) is the value of the cell of the Table I corre- 
sponding to value x; and attribute a;. 


Table V: All coordinates of three classes in each attribute. 


Attribute | Class 1 Class 2 | Class 3 
ay 5.0600 5.6800 6.600 
a2 3.6800 2.6400 2.8500 
a3 1.3400 3.9400 5.4000 
a4 0.2200 1.2400 2.0000 
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Similarly, we can calculate all the coordinates of three 
classes of four attributes, which is given in Table V as 
follows. Then, we randomly select a test example, which is 
denoted as x* = {5.1000, 3.5000, 1.4000, 0.2000}. Based on 
the Euclidean distance*, the corresponding distances between 
x* and each attribute polygon is given in Table VI. 


Table VI: Distances between target x* and all vertices of 


attribute polygons. 


Distance Class 1 Class 2 | Class 3 
a, & x 0.0400 0.5800 1.5000 
ag x 0.1800 0.8600 0.6500 
a3 6 x 0.0600 2.5400 4.0000 
ag x* 0.0200 1.0400 1.8000 


Based on Eq.(9) and Eq.(10), we can transform these values 
of distances into belief mass so as to obtain the BBAs of each 
attribute, which is given in Table VII. 


Table VII: BBAs of x* with respect to each attribute. 


m(-) | Class 1 | Class 2 | Class 3 () 

m1(-) | 0.7778 0.0523 0.0005 | 0.1694 
m2(-) | 0.3862 0.0129 0.0368 | 0.5641 
m3(-) | 0.7038 0.0000 0.0000 | 0.2962 
ma(-) | 0.8596 0.0052 0.0043 | 0.1309 


Finally, we use PCRS formula Eq.(2) to combine the weight 
of each attribute and the BBAs of each attribute so as to obtain 
the final BBA of x*>. According to the fusion result, we can 
draw a conclusion that x* belong to class 1 based on maximum 
of belief mass principle, which is consistent with the label of 


x* in the original dataset. 


Mfusion(41) e 0.8827; M fusion (92) = 0.0009; 
M fusion (3) = 0.0007; MFfusion(Q) = 0.1157; 


IV. THE SUMMARY OF RSCD 


On the next page, we give a brief pseudo-code of RSCD in 
Algorithm 1. Because RSCD in this paper is a data-driven 
model, so, first of all, we need to divide original dataset 
into training datasets and test samples (the experiments in 
this paper are using ten-fold cross validation). Afterwards, the 
training datasets are applied to construct attribute polygons 
and calculate the weights of attributes. Finally, we can obtain 
the corresponding BBAs of each test samples by calculating 
the distances between test examples and attribute polygons. 


V. SIMULATIONS 


We have tested the different classifiers on real datasets 
given in the machine learning repository of the University of 
California Irvine (UCI) [25] and listed in Table VIII. 

‘The Euclidean distance di; = d (xi,xj) = 4/(xi — xj)" (xi — x;) is 
used here. 


>In the final BBA, for the sake of convenience, 61, 02, 03 and © 
represent class 1, class 2, class 3 and unknown; And m fusion(:) 
[(m1(-) 8 ma(-)) 6 m3(-)] ® ma4(-), where @ denotes PCRS rule. 
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Algorithm 1 Solving classification problem by RSCD 
Dataset, a 0.95, Ys —2, and 6 1 
ThefinalB BAoftestdata : Mfusion() 


1) Calculate the weights of attributes wp, by 


POSB(D) = U BXiup = cB = POI 


X€U/D 
2) Calculate the BBA of each attribute, by 


(0) =1- ae?” 


* 
x 
ai 


* 
x 


Ma, (0s) = ae” sm 


3) Combine all BBAs 
Mfusion(:) =1li=l 

4) while i < m do Mfusion(*) = PCRI(Mfusion(-), Wi : 
mi(-)) Normalization(m fusion(-))- 


of attributes sequentially, by 


Table VIII: UCI datasets used in the experiments. 


Datasets Class Num. | Feature Dimention | Sample Num. 
Iris 3 4 150 
Wine 3 13 178 
Pima 2 8 768 
Bupa 2 6 345 
Ionosphere 2 34 351 


In our tests, we do not deal with the missing data problem, 
all the samples with missing values have been eliminated. 
Features of the samples are normalized by their means and 
standard deviations before their classification. As with the 
artificial datasets, we have evaluated the nearest neighbor (NN) 
classifier, the nearest class centroid (NC) classifier, two k-NN 
classifiers (one is with big k (k = 40) and the other with a 
small k (& = 5)), and the ER-NN-NC classifier (both with 
DS+BetP option, and with PCR5+DSmP option) [26]. The 
results are listed in Table IX. As we can see in Table IX, 
RSCD performs better in three datasets (Iris, Pima and Bupa) 
and the classification results are close to ER-NN-NC on the 
other two datasets (Wine and Ionosphere). 


Figure 3: The principle of expanded attribute polygon. 
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VI. DISCUSSIONS 


In this paper, the Frame of Discernment (FoD) is 0 = 
{61, 02,--- ,0,} where 0; represents the category and here we 
just consider singletons without compound focal elements°®. 
Actually, some examples are difficult to be divided into a 
single class, and it may be possible to belong to two categories 
or several categories at the same time. On the basis of con- 
structing attribute polygons in this paper, we can easily expand 
the mentioned principle above to more complex circumstances 
so as to ensure the particular target can belong to several 
classes simultaneously. The principle is illustrated in Figure 3: 
In this figure, we give a brief description of the expanded 
principle by using the three classification problem (triangle). 
In this triangle, three vertices (light blue and solid frame) 
represent single class, which is denoted by 6, 02 and @3. 
The difference is that, the centers of the three edges of such 
triangle and the center of gravity of this triangle are defined as 
compound focal elements, respectively. Specifically, the center 
of 0, and @2 is denoted as 6; M 2, in turn, we can define all 
the centers of all edges of this triangle. Besides, the center 
of gravity of this triangle is defined as 0, M 62 M 63. Then, 
we can calculate all the coordinates of these centers and also 
the corresponding distances so as to obtain the BBAs of all 
attributes. To illustrate the principle of the expanded attribute 
polygon, we again revisit Example 1 as follows: Since the 
extension method is mainly aimed at constructing BBAs, there 
is no impact on the calculation of attribute weights, so the 
following steps are only for BBAs calculation. 


e Step 1: Calculate all relevant points in expanded polygon 
which are given in Table X. In Table X, 0;, 02 and 63 
represent Class 1, Class 2 and Class 3. 6,62 corresponds 
to the hypothesis for which the target belongs to two 
categories simultaneously, and so on. The coordinates of 
0,62 and 6,102M63 are calculated as for example by: 


01) +a,(0 
a1(01 160) = gals) ; a1(62) _ 5 37 


a1(61) + a1 (02) + a1 (83) 
3 


ai(04 MN A MN 63) = = 5.78. 


Step 2: Based on Euclidean distance, we can obtain the 
corresponding distances between the target concept x* 
and all relevant points in expanded polygon, which is 
given in Table XI. 

Step 3: According to Eq.) and Eq.(10), BBAs of «* 
with respect to each attribute are shown in Table XII. 

e Step 4: Sequentially combine all four BBAs with PCRS 


Here, we do not regard © in Eq (10) as a compound focal element even 
though © can be defined as © = 6; U 02 U--- U On. Because © represents 
the ignorance or unknown of category of target concept, however, compound 
focal elements here mean that this target belongs to two categories or three 
categories at the same time. 
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Table IX: UCI datasets used in the experiments. 


Classifiers Iris(%) | Wine(%) | Pima(%) | Bupa(%) | Ionosphere(%). 

NN 93.84 94.76 69.04 60.46 84.41 

N N(Center) 92.09 95.68 72.70 56.54 79.25 

ER—NN— NC(DSmT + DSmP) 95.15 96.42 73.38 60.96 87.76 

k — NN(k = 40) 89.43 95.28 71.60 61.99 67.63 

k — NN(k =5) 95.65 95.28 72.10 59.57 82.41 

RSCD 98.00 94.17 74.50 62.87 84.11 

rule and then, we can get the final BBA as follows. REFERENCES 
M fusion (1) = 0.8763; ™M fusion (2) = 0.0001; [1] Z. Pawlak, Rough sets: Theoretical aspects of reasoning about data, 


M fusion (93) = 0.0000; Mfusion(A1 M 62) = 0.0240; 
Mfusion(O2 M 63) = 0.0000; Mfusion(A1 M 63) = 0.0006; 
Mfusion(A1 M 62M @3) = 0.0004; MFfusion(9) = 0.0985. 


Thus, we can also get the result that x* belongs to Class 
1 (0;). The biggest difference between the extension method 
and the RSCD is that the possible category of target is 
further divided so as to reduce the uncertainty in classification 
problem, which can be embodied in ma(-) in Table VII and 
Table XII. In RSCD, the assignment of x* to © with respect 
to a2 is 0.5640 (see the BBA mo(-) of Table VII), which 
means the class of x* cannot be determined if the principle 
of maximum belief mass is applied. However, in expanded 
strategy, O is further divided into 0, M 02, 62M 03, 01 M 83, 
01 1 82M 63, which ensure the target can be labelled with the 
correct class. 


VII. CONCLUSION 


In this paper, a new rough set classifier based on DSmT has 
been proposed to manage uncertainties using belief function 
theory. Our simulation results show clearly that RSCD per- 
forms well and its implementation is relatively simple since 
the attribute reduction in traditional rough set is avoided. In the 
implementation of RSCD, different types of combination rules 
can be used which give some flexibility to the users. In this 
paper, only one combination rule in DSmT (PCRS) has been 
tested. Of course many more could be implemented and tested, 
especially globally combing all BBAs in a single fusion step 
with PCR6 rule, which is left for future investigations. Also, 
The way of the attribute weights and BBAs’ calculation used in 
RSCD is an open question and we plan to make investigations 
on this question, and evaluate the robustness of RSCD in future 
research works. 


VIII. ACKNOWLEDGMENT 


This work was supported in part by the National Natural 
Science Foundation of China under Grant 61573097 and 
91748106, in part by Key Laboratory of Integrated Automation 
of Process Industry (PAL-N201704), in part by the Fundamen- 
tal Research Funds for the Central Universities (3208008401), 
in part by the Qing Lan Project and Six Major Top-talent Plan, 
and in part by the Priority Academic Program Development 
of Jiangsu Higher Education Institutions. 


[4 


[5 


[6 


[7 


[8 


[9 


[10 


{ll 


[12 


[13 


[14 
[15 


[16 


[17 


[18 


[19 


[20 


49 


Springer Science & Business Media, Vol. 9, 2012. 

P. Maji, S. Paul, Rough set based maximum relevance-maximum sig- 
nificance criterion and gene selection from microarray data, IJAR, 
Vol. 52(3, pp. 408-426, 2011. 

S. Trabelsi, Z. Elouedi, P. Lingras, Classification systems based on rough 
sets under the belief function framework, IJAR, Vol. 52(9), pp. 1409- 
1432, 2011. 

M. He, W. Ren, Attribute reduction with rough set in context-aware col- 
laborative filtering, Chinese Journal of Electronics, Vol. 26(5), pp. 973— 
980, 2017. 

D. Liang, Z. Xu, D. Liu, A new aggregation method-based error 
analysis for decision-theoretic rough sets and its application in hesitant 
fuzzy information systems, TEEE Trans. on Fuzzy Systems, Vol. 25(6), 
pp. 1685-1697, 2017. 

U. Kumar, H. Inbarani, PSO-based feature selection and neighbor-hood 
rough set-based classification for BCI multiclass motor imagery task, 
Neural Computing and Applications, Vol. 28(11), pp.3239-3258, 2017. 
Y.-S. Chen, C.-H. Cheng, Assessing mathematics learning achievement 
using hybrid rough set classifiers and multiple regression analysis, 
Applied Soft Computing, Vol. 13(2), pp. 1183-1192, 2013. 

Y.-S. Chen, C.-H. Cheng, Evaluating industry performance using ex- 
tracted RGR rules based on feature selection and rough sets classifier, 
Expert Systems with Applications, Vol. 36(5), pp. 9448-9456, 2009. 
Y.-S. Chen, C.-H. Cheng, Hybrid models based on rough set classifiers 
for setting credit rating decision rules in the global banking industry, 
Knowledge-Based Systems, Vol. 39, pp. 224-239, 2013. 

Y. Chen, Y. Xue, Y. Ma, F. Xu, Measures of uncertainty for neighbor- 
hood rough sets, Knowledge-Based Systems, Vol. 120(C), pp. 226-235, 
2017. 

T. Zheng, L. Zhu, Uncertainty measures of neighborhood system-based 
rough sets, Knowledge-Based Systems, Vol. 86(C), pp. 57-65, 2015. 
D. Chen, Y. Yang, Z. Dong, An incremental algorithm for attribute 
reduction with variable precision rough sets, Applied Soft Computing, 
Vol. 45, pp. 129-149, 2016. 

C. Gao, Z. Lai, J. Zhou, C. Zhao, D. Miao, Maximum decision 
entropy-based attribute reduction in decision-theoretic rough set model, 
Knowledge-Based Systems, 2017. 

X. Jia, L. Shang, B. Zhou, Y. Yao, Generalized attribute reduct in rough 
set theory, Knowledge-Based Systems, Vol. 91, pp. 204-218, 2016. 

J. Dezert, F. Smarandache, An introduction to DSmT, CoRR, Vol. 
abs/0903.0279, 2009. http://arxiv.org/abs/0903.0279 

F. Smarandache, J. Dezert, On the consistency of PCR6 with the 
averaging rule and its application to probability estimation, in Proc. 
of Fusion 2013 Int. Conference on Information Fusion, pp. 1119-1126, 
2013. 

F. Smarandache, J. Dezert (Editors), Advances and applications of DSmT 
for information fusion, Vols. 1-4, American Research hPress, Gallup, 
NM, USA, 2004-2015, 
https://www.onera.fr/staff/jean-dezert/references 

G. Shafer, A mathematical theory of evidence, Princeton University 
Press, 1976. 

P. Smets, Analyzing the combination of conflicting belief functions, 
Information fusion, Vol. 8(4), pp. 387-412, 2007. 

A. Martin, C. Osswald, A new generalization of the proportional 
conflict redistribution rule stable in terms of decision, in Advances 
and Applications of DSmT for Information Fusion: Collected Works 
(Volume 2), F. Smarandache & J. Dezert (Ed.), pp 69-88, 2006. 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 
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Abstract—In this paper, we prove that any dichotomous basic 
belief assignment (BBA) m can be expressed as the combination 
of two simple belief assignments m, and m, called respectively 
the pros and cons BBAs thanks to the proportional conflict 
redistribution rule no 5 (PCR5). This decomposition always exists 
and is unique and we call it the canonical decomposition of the 
BBA m. We also show that canonical decompositions do not exist 
in general if we use the conjunctive rule, the disjunctive rule, 
Dempster’s rule, Dubois and Prade’s or Yager’s rules, or even the 
averaging rule of combination. We give some numerical examples 
of canonical decompositions and discuss of the potential interest 
of this canonical decomposition for applications in information 
fusion. 


Keywords: Belief Functions, Contra-evidence, Pro-evidence, 
PCRS5, Canonical Decomposition. 


I. INTRODUCTION 


The belief functions (BF) introduced by Shafer in the 
mid of 1970’s [1] from Dempster’s works are well known 
and used in the artificial intelligence community to model 
epistemic uncertainty and to reason with it for information 
fusion. In Dempster-Shafer theory, the combination of basic 
belief assignments (BBAs) provided by distinct sources of 
evidence is done with Dempster’s rule of combination which 
suffers of serious drawbacks in high conflict situation as 
discussed by Zadeh [16], [17], but also in very low conflict 
situations [4]. As a matter of fact many rules of combination 
have been proposed in the literature [2] (Vol. 2), among 
them the combination of two sources of evidence based on 
the proportional conflict redistribution principle no5 (PCR5 
rule) [8] has been shown successful in applications, and well 
justified theoretically. However its complexity remains one of 
its limitations to prevent its use in large fusion problems. 

In this study, we show how the fusion of dichotomous 
BBAs could be done thanks to their PCR5-based canonical 
decomposition which is always possible. Such decomposition 
of dogmatic or nondogmatic BBA has never been presented in 
the literature so far. Only a canonical decomposition based on 
conjunctive rule involving improper BBA has been proposed 
by Smets in 1995 [3] and extended later by Denceux [12] to 
develop the cautious rule of combination. Here the canonical 
decomposition we present is done differently, and we show 
that any dichotomous BBA is always the result of the PCR5 
fusion of a simple proper pro-evidence BBA m, with a 
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simple proper contra-evidence BBA m,, and we show that 
this decomposition is unique. 

This paper is organized as follows. After a brief recall 
of basics of belief functions in section II, we present the 
canonical decomposition problem (CDP) in section II and 
we show the impossibility to realize the CDP of a non 
dogmatic BBA with conjunctive rule, disjunctive rule, Yager’s 
and Dubois & Prade rules, and even with the averaging rule 
of combination. In section IV, we analyze the CDP based on 
Dempster’s rule of combination and we show that it cannot 
be done for a dogmatic BBA. In section V, we prove that 
the canonical decomposition based on PCRS rule always exist 
for all the cases. In section VI, we present some particular 
decompositions of a dichotomous BBA (including dogmatic 
BBA). Some numerical examples are presented in section 
VII, and potential interests of this PCR5-based canonical 
decomposition are discussed in section VII. The last section 
concludes this paper and opens a challenging question for 
application of this new approach. 


II. BASICS OF BELIEF FUNCTIONS 


BF have been introduced by Shafer in [1] to model epis- 
temic uncertainty. We assume that the answer! of the problem 
under concern belongs to a known (or given) finite discrete 
frame of discernment (FoD) O = {01,62,...,0,}, with 
n > 1, and where all elements of © are mutually exclusive’. 
The set of all subsets of © (including empty set @ and ©) 
is the power-set of © denoted by 2°. A proper Basic Belief 
Assignment (BBA) associated with a given source of evidence 
is defined [1] as a mapping m/(-) : 2° — [0,1] satisfying 
m(0) 0 and >) yee m(A) 1. In some BF related 
frameworks, like in Smets Transferable Belief Model (TBM) 
[3], m(Q) is allowed to take a positive value. In this case, m/(-) 
is said improper because it does not satisfy Shafer’s definition 
[1]. The quantity m(A) is called the mass of A committed by 
the source of evidence. Belief and plausibility functions are 
respectively defined from a proper BBA m(-) by 


PS 


Be2°|BCA 


Bel(A) = m(B), (1) 


'That is, the solution, or the decision to take. 
This is so-called Shafer’s model of FoD [2]. 
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and 
m(B)=1-— Bel(A), (2) 


Be2°|ANBAO 
where A is the complement of A in 0. 


Bel(A) and Pl(A) are usually interpreted, respectively, as 
lower and upper bounds of an unknown (subjective) probabil- 
ity measure P(A). A is called a focal element (FE) of m(-) 
if m(A) > 0. When all FEs are singletons then m/(-) is called 
a Bayesian BBA [1] and its corresponding Bel(-) function is 
equal to Pl(-), and they are homogeneous to a (subjective) 
probability measure P(-). The vacuous BBA, or VBBA for 
short, representing a totally ignorant source is defined as* 
My(Q) = 1. A dogmatic BBA is a BBA such that m(0) = 0. 
If m(@) > 0 the BBA m(-) is nondogmatic. A simple BBA is 
a BBA that has at most two focal sets and one of them is 0. 
A dichotomous non dogmatic mass of belief is a BBA having 
three focal elements A, A and AU A with A and A subsets 
of O. 

In his Mathematical Theory of Evidence [1], Shafer pro- 
posed to combine s > 2 distinct sources of evidence rep- 
resented by BBAs mj(.),...,7ms(.) over the same FoD 9 
with Dempster’s rule (i.e. the normalized conjunctive rule). 
The justification and behavior of Dempster’s rule have been 
disputed over the years from many counter-examples involving 
high and low conflicting sources (from both theoretical and 
practical standpoints) as reported in [4]-[7]. 

Many rules of combination exist in the literature’, among 
them we recommend the rule based on the proportional 
conflict redistribution principle no5 (PCR5 rule) [8] which 
has been shown successful in applications and well justified 
theoretically. That is why we analyze it in details for solving 
the BF canonical decomposition problem (BF-CDP). PCR5 
transfers the conflicting mass only to the elements involved 
in the conflict and proportionally to their individual masses, 
so that the specificity of the information is entirely preserved 
in this fusion process. (see [2], Vol. 2 and Vol. 3 for full 
justification and examples). The PCR5 combination of two 
BBAs m, and mz defined on the same FoD O, denoted 
by mpcrs = PCR5(m1, mg), is mathematically defined as 
mpcrs(9) = 0, and VX € 2° \ {@} 


mpors(X)= > mi(X1)m2(X2)+ 
l my1(X)?m2(X2) me2(X)?m1(X2) (3) 
Roe mi(X) + mo(X2) — me(X) + mi(X2)” 
Xonx=0 


where all denominators in (3) are different from zero. If a 
denominator is zero, that fraction is discarded. The proper- 
ties of PCR5 can be found in [9]. Extension of PCRS for 
combining qualitative BBA’s can be found in [2], Vol. 2 and 
3. All propositions/sets are in a canonical form. A variant 


3The complete ignorance is denoted © in Shafer’s book [1]. 
4see [2], Vol. 2 for a detailed list of fusion rules. 
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of PCRS, called PCR6 has been proposed by Martin and 
Osswald in [2], Vol. 2, for combining s > 2 sources. The 
general formulas for PCR5 and PCR6 rules are also given in 
[2], Vol. 2. PCR6 coincides with PCR5 when one combines 
two sources. The difference between PCRS and PCR6 lies 
in the way the proportional conflict redistribution is done as 
soon as three (or more) sources are involved in the fusion. 
From the implementation point of view, PCR6 is simpler 
to implement than PCRS. For convenience, very basic (not 
optimized) Matlab codes of PCR5 and PCR6 fusion rules can 
be found in [2], [10] and from the toolboxes repository on 
the web [11]. In the sequel we work with PCR5 rule because 
only two BBAs are involved in the canonical decomposition 
process we present. 


III. THE CANONICAL DECOMPOSITION PROBLEM 
We consider a dichotomous (simplest) FoD © made of only 
two exclusive elements A and A, that is 0 = {A, A} and we 
consider a given proper? BBA m(-) : 2° — [0,1] of the form 
m(A) =a, m(AUA)=1-a-b, (4) 


m(A) = b, 
withO<a<1,0<b<1l,anda+b<1. 

The conditions 0 < a < 1 and 0 < b < 1 mean that A and 
A are FEs of the BBA. The restriction a+b < 1 means that the 
BBA is nondogmatic. This assumption of nondogmaticity of 
the BBA m(-) can be justified because most (if not all) states 
of belief, being based on imperfect and not entirely conclusive 
evidence, should be represented by nondogmatic BFs, even if 
the mass m(O) is very small as argued by Denceux in [12] 
(p. 240). In fact, we can always slightly modify a dogmatic 
BBA m/(-) in a nondogmatic BBA by discounting it with some 
small discount rate « > 0, and letting € tend towards 0 [3]. 
The case of dogmatic belief, as well as degenerate cases with 
a = 0 and b = 0 will be discussed in Section VI. Note that his 
assumption of nondogmaticity of the BBA m(-) is necessary 
for Smets canonical decomposition [3], but it is not essential 
for our PCR5-based canonical decomposition because it also 
works with a dogmatic BBA as discussed in section VI. 


The belief function canonical decomposition problem can 
be expressed as follows: 


Given a nondogmatic BBA m(-) as in (4) and a chosen rule 
of combination, find the two following simple proper BBAs 
M, and mz, of the form 


m,(A) = x, 
me(A) = 9, 


(5) 
(6) 


with (x,y) € [0,1] x [0,1], such that m = Fusion(mp, me), 
for a chosen rule of combination denoted Fusion(-,-). 
m,(-) is called the pro-BBA (or pro-evidence) of A, and 
m-(-) the contra-BBA (or contra-evidence) of A. In the section 
V we prove that this decomposition is always possible and 
unique and we call it the (PCR5-based) canonical decompo- 
sition of the BBA m/(-). The BBA m,(-) is interpreted as 


m,(AU A) =1—2, 
m(AUA)=1-y, 


Swhich means that m(@) = 0. 
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a source of evidence providing uncertain evidence in favor 
of A, whereas m,(-) is interpreted as a source of evidence 
providing uncertain evidence against A. The BBA m(-) can 
be interpreted as the result of the PCR5 fusion of these two 
(pros and cons) aspects of evidence about A. 

It is worth noting that this BF-CDP must not be confused 
with canonical decomposition problem addressed by Smets in 
[3] in his TBM framework, which is based on conjunctive 
rule of combination and which involves, in general, improper 
BBAs, called generalized simple BBA (GSBBA) in Smets 
terminology. 


A. Impossibility of decompositions by some well-known rules 


Here we analyze briefly the impossibility of a canonical 
decomposition for some well-known rules of combination. 


1) Conjunctive rule: We consider x > 1 and y > 1 so that 
the two BBAs are really informative (otherwise they become 
vacuous and useless from decision-maing standpoint). In this 
case we always have a conflict between m,(-) and m-(-) 
resulting of the conjunctive rule of combination. That is 


reonj(0) = mp(A)m.(4)=2-y>0 
Hence meonj(0) #4 0 is incompatible with the constraint 
m(@) = 0. Therefore, the canonical decomposition of the 


BBA m/(-) expressed as the conjunctive fusion of pros and 
cons BBAs m,(-) and m,(-) is impossible to get in general®, 
but in the very degenerate cases where a = 0, or b = 0, or 
a = 0 and b = 0 which would involve vacuous BBAs in the 
decomposition and of course will be useless. 


2) Disjunctive rule: If we consider the disjunctive rule 
of combination of m,(-) and m-(-) we will always obtain 
the vacuous BBA because m,(A)m-(A), mp(A)m-(A U A), 
m,(A U A)m,(A) and m,(AU A)m,(A U A) will all be 
committed to the uncertainty AU A. Therefore the combination 
result is nothing but the vacuous belief assignment ™m,, that 
is Disj(mMp,m-) = Mm,. In conclusion, we cannot make a 
decomposition of the BBA m/(-) based on the disjunctive rule 
in general because if m/(-) is informative (e.g. not vacuous) 
one always has a + b < 1 so that m(AU A) < 1 whereas 
the disjunctive rule of m,(-) and m,(-) will always provide 
m(AU A) =1. 


3) Yager’s and Dubois & Prade rules: Due to the 
particular simple form of BBAs m,(A) and m,(-), Yager’s 
rule [13] and Dubois-Prade rule [14] coincide. Based on these 
rules we are searching x and y in [0,1] such that 


m(A) =a=a2(1—y) (8) 
m(A) =b=(1—2)y (9) 
m(AU A) =1-—a—b=(1—2)\(1—y)+2y (10) 


Because the third equation is dependent of the two first, 
we have only to solve the following system of equations 


®that is for any a and b values of mass of FEs A and A of the BBA m-). 
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x — ary =aand y— zy = b. Assuming’ y < 1, one gets from 
the first equation x = mre By replacing x by its expression 
in the second equation y — zy = b we have to find y in [0, 1) 
such that (after basic algebraic simplifications) 


y+ (a—b—1)y+b=0 (11) 


This second-order equation admits one or two real solutions 
y1 and y2 if and only if the discriminant is null or positive 
respectively, that is if (a — b— 1)? — 4b > 0. However this 
discriminant can become negative depending on the values of 
a and b. For instance, for a = 0.4 and b = 0.5, we have 
(a — b— 1)? — 4b = —0.79 which means that there is no real 
solution for the equation y? — 1.1- y+ 0.5 = 0. Therefore, in 
general, the canonical decomposition of the BBA m/(-) cannot 
be accomplished from Yager’s and Dubois & Prade rules of 
combination. 


4) Averaging rule: Suppose we combine m,(-) and m-(-) 
with the averaging rule. Then we are searching x and y in 
[0, 1] such that 


m(A) =a= (a%+0)/2 (12) 
m(A) =b= (0+ y)/2 (13) 
m(AUA)=1-—a—b=((1—2)+(1-y))/2 (14) 


This means that « = 2a and y = 2b with x and y in [0, 1]. 
So, if @ > 0.5 or b > 0.5 the canonical decomposition is 
impossible to make with the averaging rule of combination. 
Therefore, in general, the averaging rule is not able to provide 
a canonical decomposition of the BBA m(-). 


IV. DECOMPOSITION BASED ON DEMPSTER’S RULE 


Let consider a nondogmatic BBA m(A) = a, m(A) = band 
m(AU A) =1-—a—b with O<a,b<1and1—a-—b>0, 
and let’s see if a decomposition of (-) is possible based on 
Dempster’s rule of combination [1]. For this, we are searching 
x and y in [0,1] such that cy 4 1 and 


m(A) = a= 20—¥) (15) 
= 
aap (16) 
1—2ay 
Pr em 8 (17) 
1— xy 


Because the third equality is redundant with the two first, 
we just have to solve the system of two equations expressed 
as 


(l—ay)ja=a(1—y), (18) 
(l-—ay)b=y(1—2). (19) 

That is, one should have 
u—xy+axy=a, (20) 
y — xy + bry = 5, (21) 


Ttaking y = 1 would means that «(1 — y) = 0 but m(A) = a with a 40 
in general, so that the choice of y = 1 is not possible. 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


with the constraints 0 < « < 1 and 0 < y < 1. So one must 


have 
1 


1l-a’ 


a 
f= =—_—_—___, 
l—y+ay 


y # (22) 


and solve the equation y — xy + bry = b with x expressed as 
function of y as above. We get the equation for a 1 


(a—1)y?+(1+b-—a)y—b=0, (23) 
whose solutions have the form 
—- 1 b —= = A 
= ~G+b-a)+ VA (24) 
2(a— 1) 
where the discriminant A is given by 
A=(1+6b—a)? —4(1—a)b 
=1+07? +a? + 2b- 2a— 2ab+ 4ab— 4b 


=a? +b*7+1 - 2b + 2ab— 2a 
=(a+6-—1)? =(1—a—d)’. 
One sees that A is strictly positive because a+b < 1 (m 
being a nondogmatic BBA). So, there exist two real solutions 

y1 and y2 of (23) of the form 
_-(1+b-a)t+VA db 

n= 2(a—1) - 

_ -(l+b-a)—-VA_1-a 
aa Xa = 1) ~ Toa 


U8) 


=k 


(26) 


For the case a # 1, the second “solution” y2 = 1 implies 
Fa eee = ¢ = 1 which is not an acceptable solution® 
because one must have xy # 1. The solution (x,y) of the 
decomposition problem for a 4 1 is actually given by the first 


solution yj, that is 


i —— 


b 
1 
Tog € OY 


a a 
rc=— a [| 
l—-y+ay 1-6 


y=" = (27) 


€ [0, 1) (28) 

The case a = 1 corresponding to the dogmatic BBA given 
by m(A) =a =1, m(A) = b=0, m(AUA) =1-a—-b=0 
is analyzed in details in Section VI - See lemma right after 
Theorem 4. 

In summary, the unique solution of decomposition of a 
nondogmatic BBA with0 <a<1,0<b<landa+b<1 
using Dempster’s rule is 7 = —% and y= —. 


l-a 


1—b 


Example 1: Consider m(A) =a=0.6, m(A) =b=0.2 
and m(AU A) =1—a—b=0.2. The solution (x,y) of 
the decomposition of m(-) based on Dempster’s rule is 

wig = ee = 0.75 and y = = 725q = 0.5. There- 
m,(A) = x2 =0.75, mp( 


T 

AUA)=1-—2=0.25 and 
m-(A) =y = 0.5, m-(AU A) =1—y=0.5. It can be ver- 
ified that m, ®@m, =m, where © represents symbolically 
Dempster’s rule of combination [1]. 


8 otherwise the denominators of Eqs. (15)—(17) will be equal to zero. 
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V. DECOMPOSITION BASED ON PCR5 RULE 


In this section we prove that the decomposition of a 
dichotomous nondogmatic BBA m/(-) based on PCRS rule 
of combination is always possible and unique. Suppose we 
combine m,(-) and m,(-) with the PCRS rule of combination. 
Then we are searching (2, y) € [0,1]? satisfying 


xy x? + xy — xy? 
A)=a=x(1—-y)+ = ——,_ 29 
m(A)=a=a(1-y) +22 -2 TN TY a) 
2 2 2 
a ry yor aey — xy 
A)=b=(1- + — = ———_,, 30 
m(A) (l—a)y =a oa (30) 
mAU A) =1-a—b=1-2-y+4+2y, (31) 
under the constraints (a,b) € [0,1]?, and 0 <a+b<1. 
The equations (29) and (30) can be rewritten as 
2 
a ees (32) 
cry 
xy 
y- = 6, (33) 
TTY 
from which (31) is redundant because (29) + (30) gives 
z+y—ry=at+b. (34) 


Therefore (1 — x)(1— y) = 1— (a+) and that is why the 
constraint a+b <1 is necessary? for the existence of the 
solution (x, y). 

With « and y in [0, 1] the solutions of (32) and (33) verify 


> a, (35) 
y > b. (36) 

Moreover, the equality (34) implies 
ul-y)=at+b-y => yK<ats, (37) 
yl-av)=a+b-x% => «carted. (38) 
For « 4 1, from (34) one gets y = ae and from (32) one 


has 


x? + acy — ay? = az + ay. (39) 


Putting this expression of y in (39), yields the equation 


x( 


which can be expressed after elementary algebraic calculation 
as 


a+b-—«a 
1-2 


o+i— ae? 


x? +(x —a) ) —ar=0, (40) 


1-2 


a* + (—a —2)2° + (20+ b)2” 


+(a+b—ab—0*)z + (—a? -—ab)=0. (41) 


This equation of degree 4 has at most four real solutions. 
We have to take only the solution x from the open interval 
(0,1) and y = (a+ b—2x)/(1 — 2) with y € (0, 1]. 

The general expression of the solutions of this quartic 
equation [15] is very complicate to obtain analytically even 


°In fact we use the constraint a + b < 1 because in this section we consider 
only nondogmatic BBA. The canonical decomposition of a dichotomous 
dogmatic BBA will be analyzed in the section VI. 
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with modern symbolic computing systems like Maple™, or 
Mathematica™, but the solutions can be easily calculated 
numerically by these computing systems, and even with 
Matlab™system (thanks to the fsolve command) as soon 
as the numerical values are committed to a and to b. 
Another method to make the decomposition consists to solve 


2 : x 24 ry — xy? 
numerically the system of equations — ae 


wey y = = b for numerical values committed to a and b 
thanks to Mathematica™, Maple™, or Matlab'™computing 
systems for instance. Of course the solutions provided by the 


two methods are the same. 


= a and 


Example 2: Let consider m(A) = 0.6, m(A) = 0.3 and 
m(AU A) = 0.1, therefore a = 0.6 and b = 0.3. The quartic 
equation (41) becomes 


4 


xt — 2.63 + 1.52? + 0.632 — 0.54 = 0. 


(42) 
The four solutions of this quartic equation provided by the 
computing system!° are approximately 
x1 © 0.7774780438, 
x2 © 0.9297589637, 
v3 © 1.419151582, 
x4 & —0.5263885898, 


which are shown on the graph of figure | obtained easily from 
Desmos online tool!!. 


(1.419, 0) 


Figure 1. Plot of the quartic function. 

Clearly x3 and x4 are not acceptable solutions because they 
do not belong to [0,1]. If we take x; = 0.7774780438 then 
will get y;) = (a+b—21)/(1—a1) = (0.9—21)/(L—a1) 
0.5506061437, so the pair (x1, y1) € [0,1]? is a solution of 
the decomposition problem of the BBA m(-). If we take 72 ~ 
0.9297589637 then will get yo = (a+ b— x2)/(1 — x2) 
(0.9 — a2)/(1 — x2) & —0.4236692006, so we see that yo ¢ 
[0,1] and therefore the pair (x2, y2) cannot be a solution of 
the decomposition problem of the BBA m/(-). Therefore the 
canonical masses m,(-) and m,(-) are given by 


Mp(A) © 0.7774780438, m,(AU A) © 0.2225219562, 
me(A) © 0.5506061437, m2-(AU A) © 0.4493938563. 


!0We did also obtain the same solutions with Maple™, and also with 
Matlab™. 
' https://www.desmos.com/calculator 
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It can be verified that the PCR5 combination of the BBAs m, 
and m,, denoted PCR5(mp, m<), is equal to the BBA m(-). 
The following important theorem holds. 


Theorem 1: Consider a dichotomous FoD © = {A, A} with 
A # Oand A ¥ 0) and a nondogmatic BBA m(-) : 2° = [0,1] 
defined on © by m(A) = a, m(A) = 6, and m(AU A) = 
1—a-—b, where a,b € [0,1] and a+ 6 < 1. Then the BBA 
m(-) has a unique canonical decomposition using PCRS rule 
of combination of the form m = PC'R5(m,,m-) with pro- 
evidence m,(A) = x, m,(AUA) = 1—a2 and contra-evidence 
m,-(A) = y, m-(AU A) =1-—y, where z, y € [0,1]. 


Proof: Based on (29)-(30), we have to prove that the following 
system Sq of equations always admits one and only one 


solution (x, y) € [0,1] x [0, 1] 
h 
cae (w,y) =a, (43) 
A(y, x) = b, 
with h(x, y) = =2“2- ay The / function can be 
prolonged in (0,0) by continuity by setting h(0,0) = 0. 
One has to prove the existence of a unique 
x € [a,a+] Cc [0,1] and y € [b,a +8] C [0,1] solutions of 
Sap, or equivalently solutions of y = oe and of (41) 


P(x) = 0 with 


P(a) = a* + (—a — 2)a? + (2a + b)2? 


+(a+b)(1—b)a—a(a+b). (44) 


Because!” 


lim P(x) = +00 and P(a) 
@w—>—0o 
x1 € (—oo,a) such that P(a,) = 0. The solution x is not 
acceptable because x1 ¢ [a,a + b]. Because!* P(1) < 0 and 


lim | P(a) = +00, there exists also x4 € (1,-++00) such that 


he 
Pan) = = 0. The solution x4 is not acceptable because x4 ¢ 


[a,a+b]. For a+b #1, one has!’ P(a+b) > 0 and P(1) < 0. 
Therefore there exists 73 € (a+b,1) such that P(w3) = 0 but 
this solution x3 is also not acceptable because x3 ¢ [a,a+ Dd]. 
Because P(a) < 0 and P(a+b) > 0 there exists x2 € [a, a+] 
such that P(a2) = 0 which is the only satisfactory solution. 
The value yz is given by y2 = apbeta, and one has yo > 0 
because %2 < a+b and y2 <1 because a+b <1. Moreover, 
from (33), y2-b= us which is always positive, therefore 
y2 > b, and from (34)" ye — (a+b) = 22(y2 — 1) which 
is always negative, therefore yg < a+b. This completes the 
proof of Theorem 1. 


< 0, there exists 


P(x) being polynomial, it is continuous and if P(c)P(d) < 0 there 
exist at least one solution between [c, d]. Therefore, we are not sure a priori 
there is only one solution between [c, d]. In our case, the signs of P(x) for 
x% = —oo,a,a + b,1,+00 are respectively +,-,+,- and +. But because one 
has four intervals, into each interval it is not possible to have more than one 
solution (because otherwise will get five or more solutions, while this equation 
has only up to four real solutions). Therefore in each interval there exists only 
one real solution. 


because P(a) = a2b — ab(a + b) = —ab?. 

'4because P(1) 1l+a+b6+(a+b)(1—b—a) =—-(a4 1)?. 

because from (40), P(a + b)/(1 — a — b)? = (a +b)? —a(a +b) > 
P(at+b) =b(a+b)(1—a—b)? > 0. 
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VI. PARTICULAR CASES OF DECOMPOSITIONS 


Here we examine the canonical decomposition of particular 
cases, including dogmatic BBA. 


A. Dogmatic BBA: a+b=1 

Theorem 2: Any dogmatic BBA defined by m(A) = a and 
m(A) = b, where a, b € [0,1] and a + b = 1, has a canonical 
decomposition using PCR5 rule of combination of the form 
m = PCR5(mp,mc-) with mp(A) =z, m,(AU A) =1-2 
and m,.(A) = y, m-(AU A) = 1—y where z, y € [0, 1]. 


Proof: Any solution of 5,» verifies 


2 
L-a= a ; (45) 

try 

xy 
y—b= , (46) 

try 

and therefore from (45)+(46) one has 

x —2£ 
CpG (47) 
c+y 
which can be rewritten as 
ry 

—y){l =(a—D). 48 
(e-+ J = (0-8) (48) 


This means that differences (a — y) and (a — b) have the 
same sign. Moreover from (34) with a+b=1 one has 
x+y —axy =1, or equivalently (1 — 2)(1 — y) = 0 which is 
satisfied if « = 1, or if y= 1 or both equal one. We must 
distinguish three cases as follows: 


e Ifa <b then’ x < y therefore y = 1 and h(x,1) =a. 
Solving h(x, 1) = a is equivalent to solve ?—ax—a = 0 
which admits only one positive solution x € [a,a+b = 1] 
given by x = am rE Note ifa+b=1 anda < b, 
then necessarily a < 0.5. 
If a > b then a > y therefore x = 1 and A(1,y) = 0. 
Solving h(1,y) = b is equivalent to solve y? — by —b = 0 
which admits only one positive solution y € [b,a+b = 1] 
given by y = a Note ifa+b=1 anda > b, 
then necessarily b < 0.5. 

e Ifa=banda+b=1 thena=b=0.5ande=y=1. 

So we have proved that a decomposition based on PCRS al- 
ways exists and it is unique also for any dogmatic dichotomous 
BBA. Therefore, this decomposition of dogmatic dichotomous 
BBA is canonical, which completes the proof of Theorem 2. 


Theorem 3: Any dogmatic BBA m(A) = a, m(A) = b with 
a+b=1 and 0 <a<1 is not decomposable from Yager’s 
rule and Dubois-Prade rule of combination. 


Proof: We have the following system of equations to solve 


(49) 
(50) 


~— ry = a, 


y— xy =b. 


‘because (x — y) and (a — b) have the same sign. 
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From (49) and (50), we get a—b = w—axy—(y—xvy) = 4-y, 
so y = x—a+b. After replacing this expression of y into 
(49) and algebraic manipulations, we have to solve 


x? —2az+a=0, 


whose solutions are of the form 


c=atvVJa(a—1). 
For 0<a<1 the system has no real solutions because 
a(a — 1) < 0, which completes the proof of Theorem 3. 


Theorem 4: Any dogmatic BBA m(A) = a, m(A) = b 
with a + b 1 is not decomposable from Dempster’s 
rule of combination for the case when (a,b) # (1,0) and 


(a,b) # (0,1). 


Proof: We have the following system of equations to solve 
with 0 < z,y<1land1l—azy 40 


ae (51) 
1— sy 
a (52) 
1— sy 


After adding the two equations (51) and (52) and because 
a+b = 1, we obtain SS = a+b = 1, whence 
ety—2Qey =1-ay, orx+y—ay = 1, orxz+y(1—2) = 1, 
or y(l—a2) =1-—a4, or = — = 1 when x ¥ 1. From (52), 
one should have a = b with y = 1, that is=*+ = b, or 
1 = b which is false because if 0 <a <1thenb=1-a¥l. 
This completes the proof of theorem 4. 


Lemma: The dogmatic BBAs m(A) = 1, m/(A) = 0 (case 
(a,b) = (1,0)), or m(A) = 0, m(A) = 1 (case (a, b) = (0, 1)) 
have infinitely many decompositions based on Dempster’s rule 


of combination. 


Proof: For the case (a,b) = (1,0) one has to solve with 0 < 
x,y <1 and 1— zy £0 the system of equations 


(53) 


This system is satisfied for r = 1 and y € [0,1), that is any 
value in [0, 1) can be chosen for y. 

For the case (a,b) =(0,1) one has to solve with 
O0<az,y <1 and 1 -— zy £0 the system of equations 


oC ey yr oy 


1— «xy 


and cf 


=0, (54) 


1—say 


This system is satisfied for y = 1 and x € [0,1), that is any 
value in [0,1) can be chosen for x. Therefore one sees that 
for the case (a,b) = (1,0) and the case (a,b) = (0,1) there 
is no unique decomposition of these BBAs from Dempster’s 
rule of combination, which completes the proof of the lemma. 
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B. Case when a = 0 and b = 0 (i.e., m is the vacuous BBA) 


This is the most degenerate case where the BBA m(-) cor- 
responds to the vacuous BBA. For averaging rule, conjunctive 
rule, Yager’s, Dubois-Prade’s, Dempster’s and PCR5 rules one 
has x = 0 and y = O (conflict between canonical masses is 
zero). In fact the vacuous BBA m(-) can always be interpreted 
as the fusion of m, and m,, where m, and mg, are also 
vacuous BBAs. This degenerate case has no particular interest 
in practice but to model the total ignorant state of knowledge. 


C. Case when a = 0, or b=0 


In the case a = 0 and 0 < b < 1, then for conjunctive rule, 
Yager’s, Dubois-Prade’s, Dempster’s and PCR5 rules one has 
x = 0 and y = b (conflict between canonical masses is zero) 
and m(-) corresponds to the fusion of vacuous pro-evidence 
Mp = My, with the contra-evidence m, = m. In the case 
0 < a < 1 and b = O, then for conjunctive rule, Yager’s, 
Dubois-Prade’s, Dempster’s and PCR5 rules one has x = a 
and y = 0 (conflict between canonical masses is zero) and 
m(-) corresponds to the fusion of the pro-evidence m, = m 
with the vacuous contra-evidence m, = m,. These cases have 
no particular interest because they can be seen just as the 
combination of pros (or cons) BBA with the vacuous BBA 


D. Case when a = b € (0,0.5) 


Theorem 5: In the case a b € (0,0.5), the BBA 
m(A) m(A) a and m(A U A) 1 — 2a can be 
canonically decomposed from PCR5 rule with the BBAs 
mp(A) = 1— V1—2a, m(AU A) = VI—2a and 
m.(A) =1—/1—2a, m.(AU A) = JT 2a. 


Proof: From (29) and (30), one has ate ay" 


has also in this case ee = b = a. Therefore x? + 
xy — xy? = y? + zy — xy, or x? — cy? —y? + ax7y = 0, 
or («x — y)(w + y+ xy) = 0. x > 0 and y > 0 because they 
represent the masses. Therefore x + y+ ry > 0. The sum 
x+y+ay = 0 if and only if c = y = 0, but this produces 
the degenerate case, which is corresponding to a = b = 0 
(i.e. the vacuous BBA). Yet, in our theorem’s hypothesis we 
assumed a,b € (0,0.5), so a > 0, and b > O. Therefore 
x+y+axy > 0. Hence x = y. Therefore the canonical BBAs 
must be of the form m,(A) = 2, mp(AU A) = 1-2 and 
m-(A) = x, m-(AU A) = 1-2. So one must solve the 
equation!” x — x? + ae = m(A) = a, or equivalently $2? — 
x +a = 0, whose solutions are 7; = 1+ 1 — 2a, and 
v2 = 1— V1—2a. For 0 < a < 0.5, the solution x, > 
1 is not admissible because x, ¢ [0,1]. The solution x2 is 
acceptable because if 0 < a < 0.5, then 0 < 2a < 1, or 
—1< —1+ 2a < 0, or (by multiplying by -1 the inequalities) 
1>1-2a>0, or0<1—2a <1, or V0 < V1 —2a < V1, 
or 0 > —V1-—2a > —1, or 1 > 1 — V1-— 2a > O hence 


x2 € (0,1). This completes the proof of Theorem 5. 


=a and one 


2 
aw 


2 


1TTp fact, we have also the second equation x a4 


to solve which is the same as the first one. 


=m(A)=b=a 
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VIT. EXAMPLES 


We give in Tables I-[X some numerical examples of PCR5- 
based canonical decompositions of BBA m/(-) for different 
sampled values of a and b for convenience. These numerical 
examples may be useful for researchers working with belief 
functions and interested by this new type of decomposition in 
their own examples. The values have been approximated at 
the 10th digit. 


0.1055728059 
0.1155063468 
0.1283308324 
0.1445620975 


0.1055728059 
0.2085867463 
0.3116654549 
0.4155040377 
0.520753 1320 
0.6284087006 
0.7403 124237 
0.8604398965 


0.1653570911 
0.1926613985 
0.2298437881 
0.2834628414 
0.3701562119 


Table I 
DECOMPOSITION OF BBA WHEN m(A) = 0.1. 


(0.2,0.1) 
(0.2,0.2) 
(0.2,0.3) 
(0.2,0.4) 
(0.2,0.5) 
(0.2,0.6) 
(0.2,0.7) 
(0.2,0.8) 


0.2085867463 
0.2254033308 
0.2477759456 
0.2763932022 
0.3133633342 
0.3628331876 
0.4339764332 
0.5582575695 


Table II 
DECOMPOSITION OF BBA WHEN m(A) = 0.2. 


0.1155063468 
0.2254033308 
0.3353044255 
0.4472135955 
0.5630877072 
0.6861 104563 
0.8233289109 


0.3116654549 
0.3353044255 
0.3675444680 
0.4098895428 
0.4669657064 
0.5506061437 
0.7178908346 


Table II 
DECOMPOSITION OF BBA WHEN m(A) = 0.3. 


0.1283308324 
0.2477759456 
0.3675444680 
0.4916206002 
0.6247896197 
0.7774780438 


0.4155040377 
0.4472135955 
0.4916206002 
0.5527864045 
0.6442577571 
0.8633249581 


Table IV 
DECOMPOSITION OF BBA WHEN m(A) = 0.4. 


0.1445620975 
0.2763932022 
0.4098895428 
0.5527864045 
0.7188975951 


0.520753 1320 
0.5630877072 


0.1653570911 

0.3133633342 

0.4669657064 

0.6442577571 
1 


0.6247896197 
0.718897595 1 
1 


Table V 
DECOMPOSITION OF BBA WHEN m(A) = 0.5. 
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0.6284087006 | 0.1926613985 
0.6861104563 | 0.3628331876 


0.7774780438 | 0.5506061437 
i 0.8633249581 


Table VI 
DECOMPOSITION OF BBA WHEN m(A) = 0.6. 


(0.7,0.1) | 0.7403124237 | 0.2298437881 
(0.7,0.2) | 0.8233289109 | 0.4339764332 
(0.7,0.3) 1 0.7178908346 


Table VII 
DECOMPOSITION OF BBA WHEN m(A) = 0.7. 


(0.8,0.1) | 0.8604398965 | 0.2834628414 
(0.8,0.2) 1 0.5582575695 
Table VIII 
DECOMPOSITION OF BBA WHEN m(A) = 0.8. 


| (2,6) [ar] iy id 
0.9,0.D 03701562119 


Table IX 
DECOMPOSITION OF BBA WHEN m(A) = 0.9. 


Figures 2 and 3 show the shapes of the pro-evidence x = 
f(a, 6) and the contra-evidence y = g(a, b) surfaces proving 
graphically the existence of canonical decomposition based 
on PCRS at the sampling rate of 0.025. The values (a,b) for 
which a + b > 1 are not acceptable and f(a,b) and g(a, b) 
have been set to zero in the figures. 


Pro-evidence: x=f(a,b) 


0.6 


a value b value 


Figure 2. Plot of = f(a,b) pro-evidence surface. 


VIII. INTEREST OF CANONICAL DECOMPOSITION 
The canonical decomposition based on PCRS offers several 
practical interests and advantages that are briefly listed here. 


1) From the theoretical standpoint, one has proved that the 
canonical decomposition based on PCRS rule always 
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2) 


Contra-evidence: y=g(a,b) 


0.6 ta ” b value 


0. 
a value 1 


Figure 3. Plot of y = g(a, b) contra-evidence surface. 


exists in all the cases for nondogmatic or dogmatic 
BBAs contrariwise to other rules of combination that 
only work in some restrictive cases. Therefore this 
decomposition is more general and mathematically well 
justified. 


This canonical decomposition of any dichotomous BBA 
m(-) into the pro-evidence m,(-) and the contra- 
evidence m,(-) allows to define now the notion of 
internal conflict of a (dichotomous) source of evidence, 
denoted Kinz(m), by 


Kint (m) = Mp(A)me- (A), (55) 


where m,(A) = x and m.(A) = y are the canoni- 
cal factors of the BBA m/(-) based on PCRS rule of 
combination. It is worth noting that the BBA m(-) has 
no internal conflict, if and only if at least one of its 
factor is the vacuous belief mass, that is if « = O or 
y = 0, or both, which makes sense. For instance the 
BBA m(A) = 0.3 and m(A U A) = 0.7 does not carry 
internal conflict because m, = m and m, = Mm, (the 
vacuous BBA) so that its internal conflict King(m) 4 
m,(A)m-(A) = 0.3-0 = 0. In fact in this example 
the BBA m/(-) carries only uncertain pro-evidence, and 
vacuous contra-evidence. This internal conflict measure 
should contribute somehow in the definition of the 
information content carried by a (dichotomous) source 
of evidence. This aspect however is not detailed in this 
paper and is left for future research works. It is clear 
that the maximum of internal conflict Kj,4(m) = 1 is 
obtained for the dogmatic BBA m(A) = m(A) = 0.5 
whose canonical decomposition by PCR5 is m,(A) = 1 
and m,(A) = 1 which shows the full conflict between 
the pro-evidence m,,(-) and the contra-evidence m,(-) of 
the source. Of course, there is no internal conflict for the 


vacuous BBA. More precisely, Kin:(m,) = 0 because 
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if a = b = 0 then one has x = y = 0 calculated from 
PCR5-based decomposition. Figure 4 shows the internal 
conflict Kinz(m) of a dichotomous BBA m. 


Internal conflict: Kee y 


Figure 4. Internal conflict Kin¢(m). 


3) This canonical decomposition allows also to define the 


4 


Ym 


notion of level of uncertainty U(m) of a dichotomous 
source of evidence m(-) as the conjunction of the 
uncertainties of pro and contra evidences, that is 


U(m) = m,(AU A)m,(AU A) 
=(l—2)(l-y)=1-a2-y+ay 


=1—a-—y4 Kint(m). (56) 


Because of PCR5-based decomposition one gets (as 
already shown in (31)) U(m) 1 — a — b which 
always belongs to [0,1]. The formula (56) is interesting 
because it clearly shows the link between the pro- 
evidence value x, the contra-evidence value y and the 
internal conflict Kin:(m) = xy. Clearly, if « = 0 and 
y = 0, then Kj,4(m) = 0 and the uncertainty is maximal 
(i.e. U(m) = 1) because the dichotomous BBA m is the 
vacuous BBA m(AU A) = 1. It can be verified that a 
dichotomous BBA m has no uncertainty (U(m) = 0) if 
and only if z = 1, or y = 1, or both which means that 
m/(-) is a Bayesian dichotomous BBA. 


The canonical decomposition allows also to adjust/revise 
easily a dichotomous source of evidence (if needed) 
according the knowledge one has on it. For instance, 
suppose one knows that the source which provides the 
BBA m(-) usually over estimates with a reinforce- 
ment factor of 6, = 20% the belief mass committed 
to hypothesis A but is always fair (unbiased) when 
committing its mass to A. Under this condition, we 
make the canonical decomposition of m(-) to get mp(-) 
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and m,(-) and we have to discount!® the pro-evidence 
m,(-) with the discounting rate of a, =1/(1+ 8p) 
to get the new unbiased BBA m,,(-) and keep the 
contra-evidence m-,(-) unchanged, so that the corrected 
(unbiased) BBA m’(-) will be obtained by the PCR5 
combination of m/,(-) with m-(-). Of course similar 
principles can be applied to discount (or reinforce) m,(-) 
as we prefer (and when necessary) by choosing the 
adequate discounting (or reinforcing) factors. 


This canonical decomposition opens the door to new 
rules of combination for the fusion of S' > 2 (dichoto- 
mous) distinct!? BBAs m,(:), s=1,2...,9. After 
making their canonical decompositions to get S' pro- 
evidences ™p,5 = (Mp,s(A), Mp,s(A), Mp,s(A U A)) 
equal to (a,,0,1—2,), and S  contra-evidences 
Me,s = (Me,s(A),Me,s(A),Me,s(AUA)) equal to 
(0,ys,1—ys) for s = 1,2,...,.5, one can for instance 
combine the S informative non-conflicting pro- 
evidences ™m,,, altogether by the conjunctive rule (or 
any rule one prefers) to get the combined pro-evidence 
m,(-), and do similarly to combine altogether the non 
conflicting contra-evidences m,,, to get the combined 
contra-evidence m,(-). Once m,(-) and m-(-) are 
calculated, we combine them with PCRS to get the 
final resulting BBA. Processing this way will greatly 
simplify the combination of many dichotomous BBAs. 
Once the decomposition of each dichotomous BBA is 
done, we could also consider to apply some importance 
discounting [10] with rates 8, to combine separately 
the set of BBAs {m,,s,s = 1,...,S} and the set of 
BBAs {me_,s,5 = l,...,.S} before making their PCR5 
combination. 


IX. CONCLUSIONS 


In this study, we have proved that any dichotomous basic 
belief assignment (nondogmatic, or dogmatic) can be decom- 
posed into two simpler proper belief assignments called the 
pro-evidence and contra-evidence that can be combined with 
PCRS rule to retrieve the original BBA. This canonical decom- 
position is unique and is always possible. No simple explicit 
form of the expression of the solution exists but the solution 
can be found quite easily with numerical solvers (Matlab, 
Maple, etc). We have also shown that the decomposition of 
any dichotomous basic belief assignment cannot be done in all 
the cases with other well-known rules of combination, which 
reinforce the interest of PCRS principle for BF combination. 
This PCR5-based canonical decomposition allows also to es- 
tablish the notion of internal conflict of a dichotomous source 
of evidence which could be helpful in some applications. It 
offers the possibility to combine several dichotomous sources 
of evidence based on the fusion of their canonical components. 
This will be presented in details in a forthcoming publication. 
The open challenging question is how to extend this notion of 


'8We use classical Shafer’s discounting method [1]. 
!9i e., cognitively independent. 
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canonical decomposition for working with more general basic 
belief assignments to make their combination more effective 
(if possible), and how could we define a measure of (uncertain) 
information thanks to this canonical decomposition. 
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Abstract—We present a new methodology for decision-making 
support based on belief functions thanks to a new theoretical 
canonical decomposition of dichotomous basic belief assignments 
(BBAs) that has been developed recently. This decomposition 
based on proportional conflict redistribution rule no 5 (PCR5) 
always exists and is unique. This new PCR5-based decomposition 
method circumvents the exponential complexity of the direct 
fusion of BBAs with PCRS rule and it allows to fuse quickly 
many sources of evidences. The method we propose in this paper 
provides both a decision and an estimation of the quality of the 
decision made, which is appealing for decision-making support 
systems. 


Keywords: Decision-Making, Belief Functions, PCRS. 


I. INTRODUCTION 


This paper deals with the decision-making support prob- 
lem from many sources of evidence characterized by belief 
functions (BF) defined over a same frame of discernment. 
Belief functions introduced by Shafer [1] are appealing to 
model epistemic uncertainty. They are well-known and used 
in the artificial intelligence community to fuse uncertain 
information and to make a decision. However, many debates in 
scientific community started with Zadeh’s criticism [2], [3] - 
see additional references in [4] - have bloomed on the validity 
of Dempster’s rule of combination and its counter-intuitive 
behavior (not only in high conflicting situations, but also in 
low conflicting situations as well). That is why many rules 
of combination have been developed by different researchers 
[5] (Vol. 2) over the last decades. In this work we consider 
only the rule based on the proportional conflict redistribution 
principle no 5 (PCRS rule) to combine basic belief assignments 
(BBAs). This choice is motived not only by its conflict 
redistribution principle, but also by its ability to generate a 
unique canonical decomposition of any dichotomous BBA that 
will be convenient for decision-making from many sources of 
evidence. 

This paper is organized as follows. After a brief recall of 
basics of belief functions in Section II, we present succinctly 
the canonical decomposition of a (dichotomous) BBA in 
Section II based on [6]. Then we propose a new decision- 
making support methodology that exploits this canonical de- 
composition in Section IV for working in a general framework 
with many (non dichotomous) sources of evidences, with basic 
illustrative examples. Conclusions are given in Section V. 
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II. BASICS OF BELIEF FUNCTIONS 
A. Definitions 


The answer! of the problem under concern is supposed to 
belong to a given finite discrete frame of discernment (FoD) 
© = {01,62,...,4,}, with n > 1. All elements of © are 
mutually exclusive’. The set of all subsets of © (including 
empty set ?) and ©) is the power-set of © denoted by 2°. A 
Basic Belief Assignment (BBA) given by a source of evidence 
is defined [1] as m/(-) : 2° — [0,1] satisfying m(@) = 0 and 
do ac2e MA) = 1. The quantity m(A) is the mass of belief 
of A. Belief and plausibility functions are respectively defined 
from m/(-) by 


Bel(A)= S> m(B), (1) 
BE2°|BCA 
and 
PI(A)=  S> — m(B) =1-Bel(A). (2) 


Be2°| ANBAO 
where A is the complement of A in O. 


Bel(A) and PI(A) are usually interpreted respectively as 
lower and upper bounds of an unknown (subjective) proba- 
bility measure P(A). A is called a Focal Element (FE) of 
m(-) if m(A) > 0. When all focal elements are singletons 
then m/(-) is called a Bayesian BBA [1] and its corresponding 
Bel(-) function is equal to Pl(-) and they are homogeneous 
to a (subjective) probability measure P(-). The vacuous BBA 
(VBBA for short) representing a totally ignorant source is 
defined as? m,(©) = 1. A dogmatic BBA is a BBA such that 
m(0) = 0. If m(Q) > 0 the BBA m/(-) is nondogmatic. A 
simple BBA is a BBA that has at most two focal sets and 
one of them is 0. A FoD is a dichotomous FoD if it has 
only two elements, say © = {A, A} with A#@ and AO. 
A dichotomous BBA is a BBA defined over a dichotomous 
FoD. 


B. PCRS Rule of Combination 


The combination of distinct sources of evidence character- 
ized by their BBAs is done by Dempster’s rule of combi- 
nation in Shafer’s mathematical theory of evidence [1]. The 


'Te. the solution, or the decision to take. 
This is so-called Shafer’s model of FoD [5]. 
3The complete ignorance is denoted © in Shafer’s book [1]. 
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justification and behavior of Dempster’s rule (corresponding 
to the normalized conjunctive rule) have been disputed from 
many counter-examples involving high and low conflicting 
sources (from both theoretical and practical standpoints) as 
reported in [4]. Many alternatives to Dempster’s rule are 
now available [5], Vol. 2. Among them, we consider in the 
sequel the PCRS rule which transfers the conflicting mass only 
to the elements involved in the conflict and proportionally 
to their individual masses, so that a more sophisticate and 
precise distribution is done with the PCR5 fusion process. 
The PCRS rule is presented in details (with justification and 
examples) in [5], Vol. 2 and Vol. 3. We only briefly recall 
for convenience its formula for the fusion of two BBAs, 
which is symbolically noted as mpcrs = PCR5(m,,mza), 
where PC'R5(-,-) represents the PCR5 fusion rule for two 
BBAs. With this PCR5 rule, one has mpcors(@) = 0, and 
VX € 2° \ {ot 


Mpors(X) = Mconj(X) 


where mMconj(X) = x X2e2® m,(X1)m2(X2) is the con- 


junctive rule, and whee W daominaion in (3) are different 
from zero. If a denominator is zero, that fraction is discarded. 
Extension of PCRS for combining qualitative BBA’s can be 
found in [5], Vols. 2 & 3. All propositions/sets are in a 
canonical form. A variant of PCR5, called PCR6 has been 
proposed by Martin and Osswald in [5], Vol. 2, for combining 
s > 2 sources. The general formulas for PCR5 and PCR6 rules 
are also given in [5], Vol. 2. PCR6 coincides with PCRS when 
one combines two sources. The difference between PCR5 and 
PCR6 lies in the way the proportional conflict redistribution 
is done as soon as three (or more) sources are involved in the 
fusion. 


III. CANONICAL DECOMPOSITION OF A DICHOTOMOUS 
BASIC BELIEF ASSIGNMENT 


Because the canonical decomposition of a dichotomous 
BBA has been presented in details in [6], we only make 
a succinct presentation here. A FoD is a dichotomous FoD 
if it is made of only two elements, say © = {A, A} with 
AUA=O@ and ANA =9. A is different from © and from 
Empty-Set because we want to work with informative FoD. A 
dichotomous BBA m(-) : 2° — [0,1] has the general form 

m(A) =a, m(AUA)=1-—a—b, (4) 


m(A) =}, 
with a,b € [0,1] anda+b< 1. 

The canonical decomposition problem consists in finding 
the two following simpler BBAs m, and m, of the form 


m,(A)=2z, m)(AUA)=1-a, (5) 
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with (x,y) € [0,1] x [0,1], such that m = Fusion(mp, me), 
for a chosen rule of combination denoted by Fusion(-,-). The 
simple BBA m,(-) is called the pro-BBA (or pro-evidence) 
of A, and the simple BBA m-,(-) the contra-BBA (or contra- 
evidence) of A. The BBA m,(-) is interpreted as a source 
of evidence providing an uncertain evidence in favor of A, 
whereas ™m,(-) is interpreted as a source of evidence providing 
an uncertain contrary evidence about A. In [6], we proved that 
this decomposition always exists and is unique if we use the 
PCRS fusion rule. In the vacuous BBA case when a = 0 and 
b = 0, the BBA m(-) can be interpreted as the PCR5 fusion 
of two degenerate pro- and contra-evidences BBAs m,(-) and 
m-(-) which coincide with the vacuous BBA with x = 0 and 
y = 0. Hence any (Bayesian, or non Bayesian) dichotomous 
BBA m(-) can be always interpreted as the result of the PCR5 
fusion of these two (pros and cons) aspects of evidence about 
A. It is worth noting that this type of canonical decomposition 
is different of Smets’ canonical decomposition problem [7] 
which needs to work with generalized simple BBA which are 
not stricto sensu valid BBAs as defined by Shafer [1]. 

For the case of dichotomous dogmatic BBA, the expression 
of solutions x and y of canonical decomposition are as follows 
[6]: 

e ifa@=banda+b=1 thena=b=0.5 andr=y=1; 

e ifa<b then x < y, and we have 


e ifa>b then x > y, and we have 


For the case of dichotomous non-dogmatic BBA, the ex- 
pression of solutions x and y of the canonical decomposition 
do not have simple analytical expression because one has to 
find x and y solutions of the system 


=, 


y= b+ Vb? +46 
= 


x? ete — ay? 
a=e(l1-y)+—4 =", =” 
c+y c+y 
ay? _ yay — ay 

c+y z+y 


under the constraints (a,b) € [0,1]?, and 0 < a+b < 1. 
In fact, we have proved in [6] that z € [a,a+ }] C [0,1] 
and y € [b,a + | Cc [0,1], but the explicit expression of x 
and y are very complicated to obtain analytically (even with 
modern symbolic computing systems like Mathematica™, or 
Maple™) because after algebraic calculation, and for x 4 1, 
one has to solve the following quartic equation which has at 
most four real solutions with only a valid one in [a, a + 0] 


a +(—a —2)x° + (2a + b)2” 
+ (a+b—ab—b*)x + (—a? —ab)=0, (9) 


and then compute y by y = (a+b—2)(1—2). 
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Once the numerical values are committed to a and to b the 
numerical (approximate) solutions x and then y can be easily 
obtained by a standard numerical solver. For instance, with 
Matlab™we can use the £fSolve command, and this is what 
we use to make the canonical decomposition of dichotomous 
non-dogmatic BBA. 


A. Canonical Decompositions From Other Well-Known Rules 


In [6] we did prove that this type of canonical decom- 
position cannot be obtained by the conjunctive rule only, 
because if m, and m, exist and if x > 0 and y > 0 then 
McConj(0) = x+y > 0 which means that m = Conj(mp, mc) 
is not a proper BBA as defined by Shafer’s. If we use 
the disjunctive rule of combination we will always ob- 
tain the vacuous BBA as the result of Disj(m ,,m-) be- 
cause m,(A)m,-(A), mMp(A)m-(AU A), m,(AU A)m-(A) 
and m,(AUA)m,(AUA) will all be committed to the 
uncertainty AU A. So for any choice of Mp and m,. we 
always get same result (the vacuous BBA) when using the 
disjunctive rule making the canonical decomposition of non 
vacuous dichotomous BBA m just impossible. Due to the 
particular simple form of BBAs m,(-) and m-(-), Yager’s 
rule [8] and Dubois-Prade rule [9] coincide, and we have to 
search x and y in [0,1] such that m(A) = a = a(1 — y) 
and m(A) = b= (1—2)y. Assuming? y < 1, one gets from 
the first equation x a/(1 — y). By replacing x by its 
expression in the second equation y — xy = b we have to 
find y in [0, 1) such that (after basic algebraic simplifications) 
y? + (a—b—1)y+b =0. This 2nd order equation admits one 
or two real solutions y; and y2 if and only if the discriminant 
is null or positive respectively, that is if (a—b—1)? —4b > 0. 
However this discriminant can become negative depending on 
the values of a and b. For instance, for a = 0.3 and b = 0.6, 
we have (a — b — 1)? — 4b = —0.71 which means that there 
is no real solution for the equation y? — 1.3-y+0.6 = 0. 
Therefore, in general (that is for all possible values a and b of 
the BBA m), the canonical decomposition of the BBA m(-) 
cannot be obtained from Yager’s and Dubois & Prade rules of 
combination. If we use the averaging rule, we are searching 
x and y in [0,1] such that m(A) a (a + 0)/2 and 
m(A) = b = (0+ y)/2, which means that 2 = 2a and y = 2b 
with x and y in [0,1]. So, if a > 0.5 or b > 0.5 the canonical 
decomposition is impossible to make with the averaging rule 
of combination. Therefore, in general, the averaging rule is not 
able to provide a canonical decomposition of the BBA m(-). 


If we consider the canonical decomposition of a dichoto- 
mous non-dogmatic BBA (a+b <1) using Dempster’s rule 
of combination [1], denoted DS(mp,m-), we have to obtain 


4Disj(mp, mc) denotes the disjunctive fusion of mp with mc. 
>Taking y = 1 would means that 2(1—y) = 0 but m(A) = a with a 40 
in general, so the choice of y = 1 is not possible. 
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x and y in [0,1] such that® zy 4 1 and 


i 
m(4) =a = 20-9) (10) 
1— sy 
: ie 
m(A) jes (11) 
1—ay 
with the constraints 0< «a<landO<y<l. 
Therefore, 
a 1 
2 = ——__—_ 12 
= ta aay aaa (12) 


and we solve the equation y — xy + bay = b with x expressed 
as function of y as above. We get the equation for a £ 1 


(a—1)y?+(1+6 b=0, 


whose two solutions are y; = b/(1 — a) and y2 = 1 - see [6] 
for details. 

For the case a 4 1, the second “solution” yz = 1 implies 
Lc i= = £ = 1 which is not an acceptable solution’ 
because one must have xy # 1. The solution (x,y) of the 
decomposition problem for a # 1 is actually given by the first 


solution y;, that is 


a)y (13) 


b 


~T-a 
a 


CS Ss 
l-—-y+ay 1-6 


Y> V1 e (0, 1), (14) 


a 


€ (0,1). (15) 

The analysis of the case a=1 corresponding to the 
dogmatic BBA given by m(A)=a=1, m(A)=b=0, 
m(AU A) = 1—a—b=0 shows that this BBA is not canon- 
ically decomposable by Dempster’s rule. Why? Because one 
has to solve with 0 < x,y < 1 and 1 — ry 4 0 the system of 
equations (~ — wy)/(1 — xy) = 1 and (y — xy)(1— ay) =0 
which is satisfied for «=1 and yé€ (0,1), that is any 
value in [0,1) can be chosen for y. Similarly, for the 
case (a,b) = (0,1) one has to solve with 0 < a,y <1 and 
1—ay #0 the system of equations (# — ry)/(1— xy) = 0 
and (y — xy)/(1— xy) =1 which is satisfied for y = 1 and 
x € (0,1), that is any x value in [0,1) can be chosen. 
Therefore one sees that for the case (a,b) = (1,0) and the 
case (a,b) = (0,1) there is no unique decomposition of these 
dogmatic BBAs from Dempster’s rule of combination. More 
generally, any dogmatic BBA m(A) =a, m(A) =b with 
a+b =1 is not decomposable from Dempster’s rule of com- 
bination for the case when (a,b) 4 (1,0) and (a,b) 4 (0, 1) 
- See Theorem 4 with its proof in [6]. 


In summary, the canonical decomposition based on Demp- 
ster’s rule of combination is possible only for nondogmatic 
BBA withO <a<1,0<b< 1anda+b < 1 and we 
have x = 7% and y= —. Dempster’s rule does not allow 
to obtain a canonical decomposition if the BBA is a Bayesian 


(dogmatic) dichotomous BBA. 


The third equality m(AUA) = 1—a—b = en being redundant 
with (10) and (11) is useless. 
7Otherwise the denominator of (10) and (11) will equal zero. 
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Example where Dempster’s canonical decomposition is 
possible 

_ Consider m(A) = a = 0.6, m(A) = b = 0.2 and m(AU 
A) = 1—a—b = 0.2. The solution («, y) of the decomposition 
of m(-) based on Dempster’s rule is 


7 @ O68 | 

oT to 
and b 0.2 

| rn ‘aaa . eas 

a =e ek 7 aces 


Therefore, the pro- and contra- evidential BBAs m, and m, 
are given by 
m,(A) = x = 0.75, 


m-(A) = y = 0.50, 


m,(AU A) =1—2 =0.25, 
m-(AU A) =1—y=0.50. 
It can be verified that DS(m,,m-) = m. 

If we make the PCR5-based canonical decomposition, we 
will obtain in this example x 0.6861 and y * 0.3628. 
Therefore, the pro- and contra- evidential BBAs m, and m, 
based on the PCR5-based canonical decomposition are 

m,(A) = x = 0.6861, m,(AU A) =1-—2 =0.3139, 


m,-(A) = y = 0.3628, m,.(AU A) =1—y = 0.6372. 


It can be verified that PCR5(mp,m-) = m. 


~ 
~w 


In the case where Dempster’s rule can be applied for making 
the canonical decomposition (that is when a + b < 1) we see 
that the canonical values (parameters) x and y can be very 
different from those obtained with PCR5 rule as shown in the 
previous example. This is normal because the principles of 
conflicting information redistribution of Dempster’s rule and 
PCRS rule are very different, and there is no link between 
parameters x and y obtained with Dempster’s rule versus those 
obtained from PCR5. In PCR5 rule the conflict is a refined 
conflict, i.e. the conflict is split into partial conflicts, so in 
PCRS the total conflict is more accurately redistributed than 
in Dempster’s rule because each partial conflict is redistributed 
only to the elements involved into it, while in Dempster’s rule 
the total conflict is redistributed to all focal elements, therefore 
even the elements that were not involved in the conflict receive 
conflicting mass, which is inaccurate. 

It is worth noting that the internal 
of m based on Dempster’s rule will be 
example xy =0.75-0.5=0.375, whereas the 
conflict of m based on PCRS rule will be only 
ry © 0.6861 - 0.3628 = 0.2489. In fact we can attest 
that the internal conflict obtained from PCR5-based 
canonical decomposition is always lesser (or equal) to 
the internal conflict obtained from Dempster-based canonical 
decomposition. Although such claim cannot be proved 
algebraically’, we can always make a fine sampling of (a, b) 
values in [0,1) satisfying a + b < 1 to evaluate numerically 


conflict 
in this 
internal 


8Because there is no simple analytical expressions for solutions a and y 
of PCRS-based canonical decomposition. 
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x and y and compare the internal conflict xy to the internal 
conflict, denoted xy’ = ~4,- 7%., obtained with Dempster- 
based canonical decomposition. In doing this we see that 
the difference A = 2’y’ — xy is always greater (or equal) 
to zero as clearly shown in Figure 1. This means that the 
PCRS5S-based canonical decomposition is more efficient than 
Dempster-based canonical decomposition because it always 
yield pro- and contra-evidences which are less conflicting 
when using PCR5 rule than when using Dempster’s rule, 
which is normal. 


Difference between internal conflicts based on Dempster's and PCR5 decompositions 


Figure 1. Plot of A = x'y’ — xy as function of a and b. 


It is important to keep in mind that Dempster-based canon- 
ical decomposition is only possible for non-dogmatic BBAs 
(when a+b < 1) but cannot be obtained with dogmatic BBAs, 
whereas PCR5-based canonical decomposition works for all 
types of dichotomous BBAs (dogmatic and non-dogmatic 
ones). 


B. Simple Example of PCR5-Based Canonical Decomposition 
Let consider m(A) = 0.3, m(A) = 0.4 and m(AU A) = 


1—m(A) — m(A) = 0.3, therefore a = 0.3 and b = 0.4. The 
quartic equation (9) becomes 


x — 2.3a2° + x? + 0.422 — 0.21 = 0. (16) 


The four solutions of this quartic equation are approxi- 

mately? 
x, & 1.5203, 
tg —0.42438, 
x3 & 0.7942, 
x4 © 0.4099. 

One sees that 7; and x2 are not acceptable solutions because 
they do not belong to [0,1]. If we take 73 ~ 0.7942 then 
x3) © —0.4576. We see that y3 ¢ [0, 1] and therefore the pair 
(%3,y3) cannot be a solution of the PCR5-based canonical 


°The solutions can be easily obtained with the roots command of Matlab™, 
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decomposition problem for the BBA m/(-) of this example. If 
we take 24 ~ 0.4099 then will get yy = (a+b—24)/(1—a4) = 
(0.7 — #4)/(1 — x4) © 0.4916 which belongs to [0, 1]. So the 
pair (x4, ya) € [0,1]? is the unique solution of the canonical 
decomposition problem. Therefore the canonical masses m,(-) 
and m,(-) are given by 


mp(A) ¥ 0.4099, m,(AU A) = 0.5901, 


and 


m,(A) © 0.4916, m,(AU A) © 0.5084. 
It can be verified that PCR5(m,,m-) =m. 
C. Advantages and Limitation of PCR5-Based Decomposition 


The PCR5-based canonical decomposition offers the follow- 
ing advantages: 


1) It is well justified theoretically. 

2) It gives us access to the simpler pro- and contra- 
evidences m,(-) and m-(-) which are unique and always 
exist for any possible (dogmatic, or non-dogmatic) di- 
chotomous BBA m(-). 

It allows to define clearly the notion of internal con- 
flict of a dichotomous source of evidence simply as 
Kint(m) = mp,(A)m-(A). 

It always provides less conflicting pro- and contra- 
evidences than what we would obtain with Dempster’s 
rule when considering non-dogmatic dichotomous BBA 
m/(-). This proves the superiority of PCR5-based canon- 
ical decomposition over Dempster’s-based canonical de- 
composition in general. 

It allows also to adjust or revise!° quite easily a dichoto- 
mous source of evidence (if needed) according to the 
knowledge one has on it by reinforcing or discounting 
its pro- or contra-evidential BBA. 

It can be easily achieved with classical numerical solvers 
on the shelf. 

The decomposition can be done off-line for many sam- 
pled (a, b) values at any precision we want, and stored in 
computer memory for working directly with m,(-) and 
m,(-) instead of making the decomposition on the fly. 
This is of prime importance for real-time applications 
where this method could be used. 

It allows to establish efficient fast'! suboptimal PCR5 
fusion scheme, see [10] for details, examples and eval- 
uations. 
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The only important limitation of this PCR5-based canon- 
ical decomposition is that it applies only to dichotomous 
BBAs, and it seems very difficult (maybe impossible) to 
use or to extend it for making directly some new canonical 
decomposition of non dichotomous BBAs. Because of this 
limitation the use of PCR5-based canonical decomposition 
appears, at first glance, quite restrictive for being really useful 
in applications involving non dichotomous BBAs. Of course 


!0This point is not detailed here because is out of the scope of this paper. 
'lWhere the complexity is linear with the number of dichotomous BBAs 
to fuse. 
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in applications working with dichotomous BBAs (like those 
in robotics or for autonomous vehicle navigation using belief- 
based perception based on grid occupancy) this PCR5-based 
canonical decomposition may have a great interest. In fact we 
have already used it for belief-based inter-criteria analysis in 
[11] and that is why we do not present our results in this 
work. Nevertheless we will show in the next section how this 
PCRS5S-based canonical decomposition could be used for the 
decision-making support in a more general context involving 
many non-dichotomous BBAs. This is a problem which has 
not been addressed in [6]. 


IV. DECISION-MAKING USING PCR5-BASED 
DECOMPOSITION 


In this section we propose a new simple general decison- 
making scheme based on PCRS-based canonical decomposi- 
tion of dichotomous BBA. We consider S > 2 distinct sources 
of evidence characterized by their BBAs!* m9 (-) defined over 
the same (possibly non dichotomous) FoD 0 = {61,...,4n}, 
with n > 1. 

Can we exploit the PCR5-based canonical decomposition in 
this context to make a decision? How? We answer positively to 
the first question and explain in details how we can proceed. 
For this, we need to express the problem in the framework 
of dichotomous BBAs that has been presented in the previous 
section. More precisely, suppose one has a BBA m®(.) defined 
on 2° with |O| > 2, then based on Bel and Pl! formulas (1)- 
(2), it is always possible to calculate Bel?(X) and PI°(X) 
for any X € 2°. From Bel®(X) and PI°(X) one can 
always build a simpler coarsened dichotomous BBA on the 
dichotomous (coarsened) FoD Ox = {X,X} if X 4 0 and 
X # Ox as follows 


m®©* (X) = Bel®(X), (17) 
m®°*(X) =1—PI°(X), (18) 
m®©* (X UX) = PI9(X) — Bel9(X). (19) 


Hence, Bel®* (X) = m©*(X) = Bel(X) and PIPEX (X) = 
m®x(X) + m°x(X UX) = Bel®(X) + PIP(X) - 
Bel®(X) = PI1°(X). This dichotomous BBA m®*(-) can 
always be decomposed canonically into its pro- and contra- 
evidences me (.) and m©x (.). 

Therefore, instead of combining S > 1 non dichotomous 
BBAs m9(.) for s=1,2,...,5 altogether from which a 
decision is classically drawn, we propose to make the decision 
from the set of all combined coarsened BBAs relatively to 
each possible dichotomous frame of discernment Ox. Of 
course this decision-scheme is only suboptimal because the 
whole information is not processed (combined) altogether, but 
separately using only the coarsened (less informative) BBAs 
m®©x (X). However, this method allows to use fast suboptimal 
PCR5 fusion of m9*(X) thanks to PCR5-based canonical 
decomposition as presented in [10] which can be applied with 
many (hundreds or even thousands) sources of dichotomous 


!2For clarity, we need to introduce in the notations a superscript to indicate 
the FoD we are working on. 
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BBAs. With this simple suboptimal decision-scheme we can 
easily restrict the domain D on which the decisions can be 
made, for instance D can be chosen as the set of singletons 
of 2°, or any other subset of 2° depending on the application 
under concern as it will be shown in the next section. The 
generic steps of the method we propose are as follows: 


e Inputs: BBAs m9(-), s = 1,..., $, and the decision domain 
Der, 
e Step 1: For s = 1,...,S, coarsening of m9(-) into 


dichotomous BBA mx (-), for each X € D based on (17)- 
(19). 

e Step 2: For s = 1,...,5, PCR5-based canonical decom- 
position of m©*(-) to get pro- and contra-evidences its 2 (-) 
and m2x (-). 

e Step 3: Conjunctive fusion of all the pro-evidences m>* (-) 
to get m,* (-). 

e Step 4: Conjunctive fusion of all the contra-evidences 
mex (-) to get mOXx (-). 

e Step 5: PCRS fusion of m?*(-) with m?*(-) to get 
md ps(-) for X € D. 

e Step 6: Decision-making from the set of the combined 
coarsened dichotomous BBAs {md ps (-),X € D} to get the 
final decision X € D. 


e Output: the final decision XED 


(2) 
P, 


In steps 3 and 4 we use the conjunctive fusion because there 
is no conflict between all pro-evidences ms X(-), and there is 
also no conflict between all contra-evidences wt 2 (-), 8 
1,...,S. The steps 1 to 5 do not require high computational 
burden and they can be done very quickly, specially if PCR5- 
based decompositions have been done off-line (as they should 
be) [10]. 

We must detail a bit more the principle of the decision- 
making for the step 6. Actually, the decision-making for 
step 6 can be interpreted as a decision-making problem from 
a set or coarsened BBAs m$2.p5(-) defined over different 
dichotomous FoD © x which are all the different coarsenings 
of the whole (refined original) FoD ©. In this paper we 
propose two methods to make the decision from the set of 
coarsened BBAs {mP&p5(-),X € D}. 


A. Method 1 for Step 6 
This method is very simple. We take the decision xX 
corresponding to the largest value of ms p5(X), that is 


X = arg max(mpG ns (X)) (20) 


If there exist several arguments having the largest value (i.e. 


there is a tie), we select the one whose ee @) is smaller. 


Example 1 (without tie): Suppose 0 = {A, B,C, D, E} and 
we want to make a decision/choice only among the elements 
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of D = {A, B,C}. Suppose after applying steps 1-5 we get 
the following 3 BBAs 


mMpérs(A) = 0.3, mpéps(A) = 0.2, mopens(A UA) =0.5, 
m23,n5(B) = 0.1,m23.n5(B) = 0.5, m23.p5(BU B) = 0.4, 
mpCnlC)= OA meen. (C) = U3)meea(C UC) =0.3. 


The decision will be X = C because 


9c 


™Mpers(C) > mons (A) > mons (B). 


Example 2 (with tie) We consider same Wee acl.) and 
me oa.) as in example | but tie? pel) is given by 


meéps(A) = 0.4, me ps(A) = 0.2, me ps(A U A) = 0.4. 


In this case, there is a tie between A and C because 
mens (A) =i opel C) = 0.4. But because mefps(A)< 
mPens(C) we will take X = A as the final decision. 


The interest of this method is above all its simplicity, but 
it does not allow to quantify the quality (trustfulness) of the 
decision which is often useful and required in decision-making 
support systems, and that is why we propose a second method 
for the decision-making of step 6. 


B. Method 2 for Step 6 


This second method is a bit more sophisticate but it circum- 
vents the exponential complexity of the direct PCR6 fusion 
of S > 2 BBAs defined on non dichotomous FoD ©. Once 
the step 5 is accomplished we propose to fuse altogether the 
(coarsened) dichotomous Wee 5 (*) and to apply the decision- 
making method based on the distance between the belief 
intervals [12]. Because the fusion must operate on the same 
common frame, we need just to express each BBA ee g(-) 
as a dichotomous BBA on © which is denoted mee. G). 
This is done very easily by just expressing each X as the 
disjunction of all elements of © included in X. The fusion 
of BBAs ee is done by the weighted averaging rule 
of combination, where each weighting factor depends on the 
decisioning-making easiness of the BBA med, r5(°) to fuse. 
The easier the decision-making, the higher the weighting 
factor. We summarize this method 2: 


‘ exte 9 
1) For each X € D, establish merle (-) from mPzpx(-) 


2) For each X € D, compute the weighting factor w(X) of 


Oxte 
meen) by 
1 Oxto 

w(X) = Bll — h(mpps)): (21) 
where C is  a_ normalization factor given 
by C= >oxen(l- h(mpste)), and where 
h(mp&ps) - H (mbps) /Hmax € [0,1] 2 is the 
normalized pignistic entropy of the BBA tee defined 
by H(mSzl?) = —Sixe90 BetP(X) log,(BetP(X)) 


and BetP(X) is the pignistic probability of X [13], and 
Amax = logs |O}. 
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Oxto 


3) Make the weighting average of mpG ps 


to get the BBA 
Oxto 


m®(-) = > w(X)mpers()- (22) 
XeED 
4) From m®(-) make the decision based on minimum of 
belief-interval distance [12], that is 
x = arg min dpz(m°,m), (23) 
where m{ is the BBA focused on X that is m2(X) =1 
and m9 (Y) = 0 if Y # X, and where dg/(.,.) is the 
belief-interval distance defined by (see [12] for details, 
justification and examples) 
dgr(m,m2)*_ [Ne- S> diy(Bh(X), Bl(X)), 
X€E2° 
(24) 
where N, = 1/2'°!-! is a normalization factor to have 
dgr(my, m2) € [0,1], and dw(Bh(X), Bl2(X)) is the 
Wassertein’s distance between belief intervals 
BI,(X) 4 [Bel, (X), Pli(X)] = [a1, 01], 
and 
BIz(X) = [Bel2(X), Pla(X)] = [a2, ba], 
given by 
a, +b ag +b 
dw ((ar,b1); faz, bal) © [[ 5 - 2p 
1.6, — a, bz — G219]2 
a a 
5) The quality (or trustfulness) of the decision is given by 


dgi(m,mg) 


q(X) =1- =. dee 


(25) 
q(X) € [0,1] becomes maximum (equal to one) when 
dgr(m®,m®) is zero, which means that m®(-) is fo- 
cused only on X. The higher q(X) is, the more confident 
in the decision X we are. When there exists a tie between 
multiple decisions {X j>J > 1}, then the prudent decision 
corresponding to their disjunction X=vU i X j Should be 
preferred (if allowed), or we can apply the method 1 
to resolve the tie, or in desperation select randomly x 


among the elements x j involved in the tie. 

Of course we could adopt a more complicate method 
where the averaging fusion could operate on all the possible 
dichotomous BBAs related with each element X € 2°\{9-9} 
instead of X € D, but this would substantially increase 
the computational burden. Because the decision X must be 
constrained to belong to D, we restrict the fusion to be applied 
only for the dichotomous BBAs related to these elements only. 
By doing this we can reduce substantially the computational 
burden if |D| is much lesser than 2!©1, 


For convenience, we show how works the method 2 in the 
previous Example 1| using the same 0 and D = {A, B,C}. 
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(-) for all X € D We have to make the weighted average of the three following 


BBAs 


CUDUE)=05, 
B=0)=0.4, 


UU 


UU 


= 0.4, 
BUDUE) =0.3, 
C = 0) = 0.3, 


wih BUCUDUE = A, AUCUDUE = B and 
AUBUDUE =C. The pignistic entropies are respectively 


UU 


equal to H(mfat?) x 2.1710, H(m$2h3) x 2.3201 
and H fies) s 2.0754, and their normalized values are 


h(A) = 2.1710/2.3219 = 0.9350, h(B) ~ 2.3201/2.3219 = 
0.9992 and A(C) ~ 2.0754/2.3219 = 0.8938. From Eq. 
(21) we get the weighting factors w(A) ~ 0.37803, w(B) = 
0.00463 and w(C) © 0.61734, and the weighted average BBA 


~ 
we 


m®(A) = w(A)meate (A) + w(B) 0+ w(C) «0 = 0.1134, 
m®(B) = w(A) -0+ w(B)m$2t2(B) + w(C) - 0 = 0.0005, 
m®(C) = w(A) -0+ w(B) -0 + w(C)mEeh? (C) ~ 0.2469, 
m°(BUCU DUE) = w(A)mSate(BUCU DUE) 

+w(B)-0+w(C)-0% 0.0756, 
m°(AUCU DUE) =w(A)-0 

+ w(B)me2te(AUCU DUB) 

+ w(C) - 0 & 0.0023, 
m°(AUBUDUE) =w(A)-04+ w(B)-0 

+ w(C)m$soie(AU BU DUE) 

~ 0.1852, 
m°(O) = w(A)mpergs(O) + w(B)mpens (©) 
+ w(C)meehe(@) = 0.3761. 


From Eq. (24) we get 


dpr(m®, m§) 0.6818, 
dpr(m®, m9) ~~) 0.7541, 
dpr(m®, me) ~ 0.5874. 


Because dp1(m?,m®) =< dp1(m°,m§) < dpr(m®, m9), 


the final decision must be X =C because it corresponds 
to the smallest dp; distance value. This decision is the 
same as with method 1. Based on Eq. (25) one has 
q(X = C) © 0.7096 indicating a pretty good trustful decision 
because it is much greater than 0.5. If one have preferred 
X =A (the second best choice) then g(X = A) = 0.6630 
which is a bit worse, and for X=B one gets the least 
trustful decision because q(X = B) ~ 0.6273. Note that a 
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more optimistic attitude (if preferred) could be obtained by 
replacing the BetP probability by the DSmP probability [5] 
(Chap. 3 of Vol. 3) in the entropy derivation. 


V. CONCLUSIONS 


In this work we have presented a very new methodology 
for decision-making under uncertainty in the framework of 
belief functions thanks to the unique PCRS5-based canonical 
decomposition of any (dogmatic or non-dogmatic) dichoto- 
mous BBAs. We have shown that this new canonical decom- 
position provides less conflicting contra- and pro-evidences 
with respect to the decomposition based on Dempster’s rule 
when the latter can be applied. Any BBAs defined on a general 
(non dichotomous) frame of discernment can be transformed 
into a set of coarsened dichotomous BBAs that can always 
be decomposed canonically and combined easily and quickly 
in one PCRS fusion step to get a suboptimal fusion result for 
each element of the decision space under consideration. The 
final decision can be made in two ways: either by a simple 
comparative analysis of masses of elements of the decision 
space, or on the minimization of belief-interval distance which 
also offers the advantage of quantifying the quality of the 
decision. The evaluation of this new methodology for real ap- 
plications is under progress and it will reported in forthcoming 
publications. 
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Abstract—In this paper, we propose a new fusion approach 
to combine basic belief assignments (BBAs) defined on a di- 
chotomous frame of discernment based on their canonical de- 
composition. In a companion paper, we have already proved 
that the canonical decomposition of this type of BBA (called 
dichotomous BBA) is always possible and unique thanks to 
the proportional conflict redistribution rule No 5 (PCR5). More 
precisely, any dichotomous BBA is always the PCR5 combination 
of two simpler basic belief assignments named respectively the 
pro-evidence, and the contra-evidence. From this interesting 
canonical decomposition, we present a new way of combining 
many dichotomous BBAs and we show that the computational 
time for fusing these dichotomous BBAs based on their canonical 
decomposition is quasi-linear with the number of sources to 
combine, contrary to the direct fusion of the dichotomous BBAs 
altogether. 


Keywords: Information fusion, canonical decomposition, be- 
lief functions, PCR5 rule, PCR6 rule. 


I. INTRODUCTION 


The belief functions (BF) introduced by Shafer in the 
mid of 1970’s [1] from Dempster’s works are well known 
and used in the artificial intelligence community to model 
epistemic uncertainty and to reason with it for informa- 
tion fusion and decision-making support. Dempster’s rule to 
combine distinct sources of evidence characterized by their 
basic belief assignments (BBAs) defined on the same frame 
of discernment (FoD) is the historical and emblematic rule 
of combination in Dempster-Shafer Theory (DST). Unfortu- 
nately, Dempster’s rule (denoted by DS rule for short) suffers 
of serious drawbacks in high conflict evidences as pointed 
out by Zadeh [2], [3], but more importantly also in some 
very low conflict situations [4] as well. That is why many 
rules have been proposed in the literature [5] (Vol.2), among 
them the combination of two sources of evidence based on 
the proportional conflict redistribution principle No. 5! (PCRS 
rule) justified theoretically in [6], which has been shown 
successful in applications. However its complexity remains 


' Actually PCR6 rule is preferentially used for the combination of more than 
two sources altogether. For two sources, PCR5 and PCR6 rules coincide and 
because canonical decomposition involved only two sources, we only need to 
work with PCRS rule to combine the pro-evidence with its contra-evidence. 
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one of its limitations which prevents its use in fusion problems 
involving many sources of evidence to combine, and its non 
associativity property* which make it not so appealing because 
the fusion order matters when sequential PCR5 fusion is 
applied instead of global combination of the sources altogether. 

In this work, we show how the fusion of many sources 
of evidences represented by BBAs defined on a same di- 
chotomous frame of discernment can be easily done based on 
the PCR5-based canonical decomposition of the BBAs. Such 
decomposition of BBA has been proposed recently in [7]. 

We recall that another canonical decomposition based on 
conjunctive rule (but involving improper? BBA) had been 
proposed in 1995 by Smets [8], and extended later by Denceux 
[11] to develop the so-called cautious rule of combination. 
In this new approach we use our well justified canonical 
decomposition based on PCR5 which is strictly based on a 
proper (i.e. normal) BBAs as defined by Shafer himself. We 
have shown that any dichotomous BBA is always the result 
of the PCRS fusion of a simple proper pro-evidence BBA 
mM, With a simple proper contra-evidence BBA mz,, and that 
this decomposition is unique. Based on this important result, 
we address in this work the problem of combination of many 
dichotomous BBAs based on their canonical decomposition. 

This paper is organized as follows. After a brief recall of 
basics of belief functions in section II, we present briefly the 
canonical decomposition for any dichotomous BBA based on 
PCRS rule of combination in section III which is explained 
in more details with proofs, and examples in [7]. The fusion 
of dichotomous BBAs based on the principle of canonical 
decompositions is detailed in section IV. Concluding remarks 
with perspectives are given in the last section. 


II. BASICS OF BELIEF FUNCTIONS 


Belief functions (BF) have been introduced by Shafer in [1] 
to model epistemic uncertainty. We assume that the answer* of 
the problem under concern belongs to a known (or given) finite 


2PCRS is only quasi-associative. 

3We call a BBA improper when it does not satisfy Shafer’s original 
definition. Smets called it a generalized simple BBA (GSBBA). 

4i.e. the solution, or the decision to take. 
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discrete frame of discernment (FoD) 0 = {61,62,...,8n}, 
with m > 1, and where all elements of O are mutually 
exclusive and exhaustive’. The FoD is said dichotomous when 
it involves only two elements (one subset and its complement), 
that is © = {A,A} where A is the complement of A in 
©. The set of all subsets of © (including empty set @ and 
©) is the power-set of © denoted by 2°. A proper Basic 
Belief Assignment (BBA) associated with a given source of 
evidence is defined [1] as a mapping m(-) : 2° > [0,1] 
satisfying m(@) = 0 and }) 4256 m(A) = 1. In some BF 
related frameworks, like in Smets Transferable Belief Model 
(TBM) [8], m(@) is allowed to take a positive value. In this 
case, m/(-) is said improper because it does not satisfy Shafer’s 
definition [1]. The quantity m(A) is called the mass of A 
committed by the source of evidence. Belief and plausibility 
functions are respectively defined from a proper BBA m(-) by 


Bel(A)= > m(B), (1) 
BE2°|BCA 
and 
PI(A) = m(B) = 1 —Bel(A), (2) 


Be2°|ANBAO 
where A is the complement of A in 0. 


Bel(A) and Pl(A) are interpreted respectively as lower and 
upper bounds of an unknown (subjective) probability measure 
P(A) in original Dempster’s works [9], [10]. The quantities 
m(-) and Bel(-) are one-to-one and the following Mdébius 
inverse formula holds (see [1], p. 39) 


S> (-1)/4-?lBel(B). 


BCACO 


m(A) (3) 


A is called a Focal Element (FE) of m(-) if m(A) > 0. 
When all focal elements are singletons, m/(-) is called a 
Bayesian BBA [1] and its corresponding Bel(-) function is 
equal to Pl(-) and they are homogeneous to a (subjective) 
probability measure P(-). The vacuous BBA, or VBBA for 
short, representing a totally ignorant source is defined as° 
m,(Q) = 1. A dichotomous BBA is a BBA defined on 
a dichotomous FoD. A dogmatic BBA is a BBA such that 
m(0) = 0. If m(O) > 0 the BBA m(-) is nondogmatic. A 
simple BBA is a BBA that has at most two focal sets and one 
of them is ©. A dichotomous non dogmatic mass of belief is 
a BBA having three focal elements A, A and AU A with A 
and A subsets of O. 

In his Mathematical Theory of Evidence [1], Shafer pro- 
posed to combine s > 2 distinct sources of evidence rep- 
resented by BBAs mj(.),...,7ms(.) over the same FoD 0 
with Dempster’s rule (i.e. the normalized conjunctive rule). 


5This is so-called Shafer’s model of FoD [5]. 
©The complete ignorance is denoted © in Shafer’s book [1]. 
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For the combination of two BBAs, Dempster’s rule formula 
[1] is given by mps(0) = 0 and VX € 2° \ {0} 


1 
mps(X) = Ko m1(X1)mo(X2), (4) 
ne X1,X2€29 
X1{NX9=X 
with Ky2 =1- m4 (X1)m2(Xo). 


X1,X2€2°|X1NX2=0 
The justification and behavior of Dempster’s rule have been 


disputed over the years from many counter-examples involving 
high and low conflicting sources (from both theoretical and 
practical standpoints) as reported in [4], [12]-[14]. Many rules 
of combination exist in the literature’, among them we recom- 
mend the rule based on the proportional conflict redistribution 
principle No. 5 (PCRS rule) [6] which has been shown to 
be successful in applications and well justified theoretically. 
That is why we analyze it in details for solving the BF 
canonical decomposition problem (BF-CDP). PCRS5 transfers 
the conflicting mass only to the elements involved in the 
conflict and proportionally to their individual masses, so that 
the specificity of the information is entirely preserved in this 
fusion process (see [5], Vol. 2 and Vol. 3 for full justification 
and examples): mpcrs(0) = 0 and VX € 2° \ {0} 


mpcrs(X) = oS my1(X1)m2(X2)+ 
rma (X)?rrig Xa) in(X)2m (Xs) 
2 Ga (X) + ma(Xa) * mig(X) + oma(Xa)? © 
XanX=0 


where all denominators in (5) are different from zero. If a 
denominator is zero, that fraction is discarded. The properties 
of PCRS can be found in [15]. Extension of PCRS5 for 
combining qualitative BBA’s can be found in [5], Vol. 2 & 3. 
A variant of PCRS, called PCR6 has been proposed by Martin 
and Osswald in [5], Vol. 2, for combining s > 2 sources. The 
general formulas for PCR5 and PCR6 rules are also given in 
[5], Vol. 2. PCR6 coincides with PCR5 when one combines 
two sources. The difference between PCR5 and PCR6 lies 
in the way the proportional conflict redistribution is done as 
soon as three (or more) sources are involved in the fusion. 
From the implementation point of view, PCR6 is simpler 
to implement than PCRS. For convenience, very basic (not 
optimized) Matlab™codes of PCR5 and PCR6 fusion rules 
can be found in [5], [16] and from the toolboxes repository 
on the web [17]. The main drawback of PCR5 and PCR6 
rules is their combinatorial complexity when the number of 
source is big. Even for combining BBAs defined on a simple 
dichotomous frame of discernment, the computational time for 
combining more than 20 sources can take several hours®. 
Our main motivation and contribution is to propose a faster 
fusion method to combine many dichotomous BBAs in order 


Tsee [5], Vol. 2 for a detailed list of fusion rules. 

8due to the exponential complexity of the PCR6 rule (as shown in Figure 
4). For our simulations, we did use a MacBook Pro 2.8 GHz Intel Core i7 
with 16 Go 1600 MHz DDR3 memory running Matlab™ R2018a. 
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to overcome the combinatorial complexity problem by estab- 
lishing a new effective (approximating) fusion method based 
on the new PCRS5-based canonical decomposition principle. 
It is worth noting that our new method is very different 
of the method based on the clustering of non conflicting 
BBAs followed by a discounting step and the conjunctive rule 
presented in [18]. 


III. CANONICAL DECOMPOSITION OF DICHOTOMOUS BBA 


A FoD 0 = {A, A} is called dichotomous if it consists of 
only two elements A and A with AU A = 0 and ANA=9. 
A is different from © and from Empty-Set because we want 
to work with informative FoD. Indeed, the very special frame 
{0,0} does not bring any useful information since the only 
possible BBA with such frame is the vacuous BBA. So, we 
consider a given proper? BBA m(-) : 2° — [0,1] of the form 

m(A) =a, m(AUA)=1—-a-—b, (6) 


m(A) =), 
withO<a<1,0<b<landa+b< 1. 

The conditions 0 < a < 1 and 0 < b < 1 mean that A and 
A are focal elements of the BBA. The restriction a + b < 1 
means that the BBA is nondogmatic. 

This assumption of nondogmaticity of the BBA m(-) is 
necessary for Smets canonical decomposition [8], but it is 
not essential for our PCR5-based canonical decomposition (as 
we will show in the sequel) because our decomposition also 
works directly with a dogmatic BBA. Of course any dogmatic 
BBA can always be modified as a non-dogmatic BBA by 
using a very small discounting number (€ > 0) so that, in 
practice, Smets’ decomposition can always be applied, but 
this is not sufficient to prove that Smets approach always 
provides relevant results. Why? just because we know (and we 
have proved) that Dempster’s (normalized conjunctive rule) 
and even the conjunctive rule in Smets’ TBM suffers of 
serious drawbacks - see justifications in our aforementioned 
references. That is why we explore in this work another way of 
making a canonical decomposition, which is, for now, limited 
to dichotomous BBA. 

Our canonical decomposition problem consists in finding 
the two following simple proper BBAs m, and m, of the 
form 

Mp(A) = 2, 


m-(A) =Y; 


m,(AU A) =1—2, 
m-(AUA)=1-y, 


(7) 
(8) 


with (x,y) € [0,1] x [0,1], such that m = Fusion(m,, me), 
for a chosen rule of combination denoted by Fusion(-,-). The 
simple BBA m,(-) is called the pro-BBA (or pro-evidence) 
of A, and the simple BBA m,(-) the contra-BBA (or contra- 
evidence) of A. The BBA m,(-) is interpreted as a source 
of evidence providing an uncertain evidence in favor of A, 
whereas m,(-) is interpreted as a source of evidence providing 
an uncertain contrary evidence about A. 

This decomposition is possible with Dempster’s rule only 
if0<a<1,0<b6b<1anda+b <1, and in this case we 


°which means that m(0) = 0. 
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and y x. However, any dogmatic BBA 
m(A) = a, m(A) = b with a + b = 1 is not decomposable 
from Dempster’s rule for the case when (a,b) # (1,0) and 
(a,b) # (0,1) (see Theorem 4 in [7]), and the dogmatic 
BBAs m(A) 1, m(A) 0 (case (a,b) = (1,0)), 
or m(A) = A 1 (case (a,b) = (0,1)) have 


0, m(A) = 
infinitely many decompositions based on Dempster’s rule of 
combination (see Lemma in [7]). In [7], we have shown 
that our canonical decomposition cannot be achieved from 
conjunctive, disjunctive, Yager’s [19] or Dubois-Prade [20] 
rules of combination, neither from averaging rule. However, 
such type of decomposition is unique and is always possible 
in all cases of dichotomous BBA m/(-) using the PCRS rule 
of combination. In [7], we did prove the following Theorem. 


Theorem 1: Consider a dichotomous FoD © = {A, A} with 
A # Oand A ¥ 0) and a nondogmatic BBA m(-) : 2° — [0,1] 
defined on © by m(A) = a, m(A) = 6, and m(AU A) = 
1—a-—b, where a,b € [0,1] and a+b < 1. Then the BBA 
m(-) has a unique canonical decomposition using PCRS rule 
of combination of the form m = PC'R5(m,,m-) with pro- 
evidence m,(A) = x, m,(AUA) = 1—< and contra-evidence 
m,-(A) = y, m-(AU A) =1—y where z, y € (0, 1]. 

Moreover, we also proved in [7] that the canonical decom- 
position also exists even if the dichotomous BBA is dogmatic 
(i.e. Bayesian) and the following theorem also holds. 


Theorem 2: Any dogmatic BBA defined by m(A) = a and 
m(A) = b, where a, b € [0,1] and a + b = 1, has a canonical 
decomposition using PCR5 rule of combination of the form 
m = PCR5(mp,me-) with mp(A) =z, mp(AU A) =1—2 
and m,(A) = y, m-(AU A) =1—y where z, y € [0, 1). 


Theorems | & 2 prove that the decomposition based on 
PCRS always exists and it is unique for any dichotomous 
(nondogmatic, or dogmatic) BBA. 


For the case of dichotomous dogmatic BBA considered 
in Theorem 2, the expression of solutions x and y can be 
established explicitly as follows - see [7] for details 


e If a <_b then x < y, and we have y = 1 and & = 


atv a +4a 


5 : 
e If a > 6 then x > y, and we have x 


b+ Vo? 446 
ae 
e Ifa=banda+b=1 thena=b=0.5ande=y=1. 


For the case of dichotomous nondogmatic BBA considered 
in Theorem 1, one has to find x and y solutions of the system 


1 and y = 


x? ete — xy’ 
a=a(1-y)+—— = 
x+y x+y 

2 2 oe) 
b=(1—2)y+ ry pap ans 2 eee (10) 
x+y o+y 


under the constraints (a,b) € [0,1]?, and0<a+b<1.In 
fact, it has been proved in [7] that x € [a,a+ b] c [0,1] 
and y € [b,a +b] C [0,1], but the explicit expression of x 
and y are very complicated to obtain analytically (even with 
modern symbolic computing systems like Mathematica™, or 
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Maple™) because after algebraic calculation, and for x ¥ 1, 
one has to solve the following quartic equation which has at 
most four real solutions with only a valid one in [a,a + 0] 


at + (—a — 2)a3 + (2a + b)x? 


+(a+b—ab—b*)z+(-a? —ab)=0, (1) 


— atb—ax 
and then compute y as y = =. 


Fortunately, the solutions can be easily calculated nu- 
merically by these computing systems, and even with Mat- 
lab™system!° as soon as the numerical values are committed 
to a and to b, and this is what we do in our simulations in the 
sequel. 


Example 1: Let consider © = {A, A} and m(A) = 0.6, 
m/(A) = 0.3 and m(AUA) = 1—m(A)—m(A) = 0.1. Hence, 
a = 0.6 and b = 0.3. The quartic equation (11) becomes 


g* — 2.623 + 1.52? + 0.632 — 0.54 = 0. (12) 


The four solutions of this quartic equation provided by the 
computing system!! are approximately 


x1 © 0.7774780438, 
x3 © 1.4191515820, 


x2 & 0.9297589637, 
x4 © — 0.5263885898. 


Clearly x3 and x4 are not acceptable solutions because they 
don’t belong to [0,1]. If we take x; % 0.7774780438, then 
we will get y; = (a+ b—2))/(1 — 21) = (0.9-21)/1 - 
21) © 0.5506061437. The pair (x1, y1) € [0,1]? is a solution 
of the decomposition problem of the BBA m(-). If we take 
2 © 0.9297589637, then we will get yz = (a+b—2x2)/(1- 
x2) = (0.9 — x2)/(1 — x2) & —0.4236692006. We see that 
y2 ¢ [0, 1], and therefore the pair (22, y2) cannot be a solution 
of the decomposition problem of the BBA m/(-). Therefore the 
canonical masses m,(-) and m,(-) are given by 


~ 
~ 


mp (A) © 0.7774780438, 


m-(A) ~ 0.5506061437, 


mp(A U A) © 0.2225219562, 
m<(A U A) © 0.4493938563. 


It can be verified that the PCRS combination of BBAs m, and 
Me, denoted by PCR5(mp, Mc), is equal to the BBA m(-). 

Of course there are necessarily numerical approximations 
involved by the proposed decomposition because this decom- 
position is obtained by numerical solvers. This may have some 
little impact in the PCRS fusion result but because PCR5 rule 
is numerically robust to small input changes (contrariwise to 
Dempster’s rule) the PCRS result will not change substantially 
with small changes (due to small numerical imprecisions) in 
the values of BBAs to combine. 


A. Particular cases 

1) Case (a,b) = (0,0) (i.e. m is the vacuous BBA): This 
is the most degenerate case where the BBA m/(-) corresponds 
to the vacuous BBA. For averaging rule, conjunctive rule, 
Yager’s, Dubois-Prade’s, Dempster’s and PCRS rules one has 


10thanks to the fsolve Matlab™ command. 
'lWe did get same solutions with Maple™, and with Matlab™, 
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x O and y O (conflict between canonical masses is 
zero). In fact the vacuous BBA m(-) can always be interpreted 
as the fusion of m, and m,, where m, and m, are also 
vacuous BBAs. This degenerate case has no particular interest 
in practice but to model the total ignorant state of knowledge. 

2) Case when a = 0, or b = O: In the case a = O and 
0 <6 <1, then for conjunctive rule, Yager’s, Dubois-Prade’s, 
Dempster’s and PCRS rules one has x = 0 and y = b (conflict 
between canonical masses is zero) and m/(-) corresponds to 
the fusion of vacuous pro-evidence m, = m, with the contra- 
evidence m,. = m. In the case 0 < a < 1 and b = 0, then 
for conjunctive rule, Yager’s, Dubois-Prade’s, Dempster’s and 
PCRS rules one has « = a and y = O (conflict between 
canonical masses is zero) and m(-) corresponds to the fusion 
of the pro-evidence m, = m with the vacuous contra-evidence 
Me = My. These cases have no particular interest because they 
can be seen just as the combination of pro (or contra) BBA 
with the vacuous BBA. 

3) Case when a = b € (0,0.5): In this case, the BBA 
m(A) = m(A) = aand m(AUA) = 1—2a can be canonically 
decomposed from PCRS rule with the BBAs m,(A) = 1 — 
VI — 2a, m»(AU A) = V1 — 2a and m,(A) = 1- V1 — 2a, 
m-(AU A) = V1 — 2a - see details and proof in [7]. 


B. Benefits of canonical decomposition 


The canonical decomposition based on PCRS offers several 
interests and advantages that are briefly listed. 


1) This canonical decomposition of m/(-) into the pro- 
evidence m,(-) and the contra-evidence m,(-) allows to 
define the notion of internal conflict of a dichotomous 
source of evidence, denoted by Kinz(m), as 


Kint (m) = Mp (A)me. (A), 


where m,(A) = x and m,(A) = y are the canonical 
factors of the BBA m/(-) based on PCRS rule of combi- 
nation. 

The canonical decomposition also allows to adjust/revise 
easily a dichotomous source of evidence (if needed) 
according to the knowledge one has on it. For instance, 
if one knows that a source over (or under) estimate the 
hypothesis A, then one could apply an adjustment (based 
on some discounting or reinforcing factors) on the pro 
(or contra) evidence to de-bias this source of evidence. 
This canonical decomposition can help to develop 
new fast rules of combination for the fusion of 
> 2 (dichotomous) distinct! BBAs m,(-) 
ms(A),ms(A),ms(A U A)) = (as,6s,1 — as — bs), 
=1,2,...,S. This is presented next. 


? o] 


(13) 


2 


wm 


3 


wm 


IV. FAST FUSION OF DICHOTOMOUS BBAS 


In this section, we show how to combine many dichotomous 
BBAs defined on the same FoD © thanks to their canonical 
decompositions. 


!21 e. cognitively independent. 
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A. Principle of the fast fusion of dichotomous BBAs 


The main idea for making the fast fusion of dichotomous 
BBAs is, at first, to decompose canonically each dichoto- 
mous BBA m,(.), for s = 1,2,...,S into their pro and 
contra evidences Mp,5 = (Myp,s(A), Mp,s(A), Mp,s(AUA)) = 
(%5,0,1—as) and Mme,5 = (™Me,s(A), Me,3(A), Me,3(AUA)) = 
(0, ys,1 — ys), and then to combine the pro-evidences my, 
for s = 1,2,...,5 altogether on one hand to get a global 
pro-evidence m,, and to combine the contra-evidences m¢,5 
for s = 1,2,...,S altogether on the other hand to get a 
global contra-evidence m,. The fusion step of pro and contra 
evidences is discussed in section IV-D. Once m, and m, 
are calculated, then one combines them with PCRS fusion 
rule to get the final result. This general principle of the new 
fusion method is represented by the diagram of figure 1 for 
convenience. 


m4 


{ Canonical decomposition ) 
Pro-evidence | | Contra-evidence| 
Mp1 Mel 


Ts 


| Fusion of pro-evidences 


ms 


Canonical decomposition ) 
Pro-evidence | | Contra-evidence 
™Mp,S Mes 


lin of contra-evidences 


| Dichotomous BBA | Dichotomous BBA | 


| {ipo = 1,000.98} {mc,s,8 = 1,...,S} } 
— fee te 
™M, i Me 
| ig | \ Fusion ) | c | 
Combined BBA 


m 


Figure 1. General principle of the fusion of dichotomous BBAs from their 
canonical decompositions. 


This new fusion approach is interesting because the fusion 
of the pro-evidence m, , (resp. contra-evidences m,,,) 18 quite 
simple because there is non conflict between m, 5 (resp. 
between ™m,,s), so that their fusion can be done quite easily 
and a large number of sources can be combined without a high 
computational burden. In fact, with this fusion approach, only 
one PCRS5 fusion step of simple (combined) canonical BBAs 
is needed at the very end of the fusion process. It is worth 
noting that in this work there is no link with the canonical 
decomposition proposed by Shafer and then extended by 
Smets because here we use another fusion rule based on the 
proportional conflict redistribution principle. 


B. Analysis of the effectiveness of this new fusion approach 


Because the PCR5 rule!? of combination is not associative, 
the fusion!* of the canonical BBAs followed by their PCR5 


'3The same remark holds for PCR6 rule with more than two BBAs. 

'4We assume here that the fusion of all the pro-evidences (resp. contra- 
evidences) is done with PCRS rule which coincides in this case with the 
conjunctive rule because there is no conflict between the pro-evidences (resp. 
the contra-evidences). 
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fusion will not provide in general the same result as the 
direct fusion of the dichotomous BBAs altogether but only 
an approximate result, which is normal. 

The main question is to know how good is the approx- 
imation obtained by this new fusion method based on the 
fusion of pro-evidences and contra-evidences with respect to 
the direct fusion of the BBAs with PCR5 (or PCR6 when 
considering more than two sources to combine). To answer to 
this important question we make a statistical analysis of the 
quality of the combined result m, with respect to the direct 
PCRS, or PCR6 fusion of all BBAs altogether. 

The measure of the goodness is obtained by the normalized 
(Euclidean) Belief Interval distance dg;(mpcrs,m) (for the 
case of two BBAs only), or by dgr(mpcre,m) if more 
than two sources are considered in the fusion process, where 
m is the result of the fusion principle based on canonical 
decompositions, and mpcrs (resp. Mpcre) is the result 
of the combination of original BBAs altogether with PCR5 
(resp. PCR6) rule. The dg; distance between two BBAs 
my,(-) and mga(-) defined on the powerset of a given FoD 
© = {61,...,4,} has been proposed and justified in [21], 
[22]. It is defined by 


A 


Ne: S) diy(Bh(X), Blo(X)), (14) 
X€E2° 


dgr(m1,m2) 


where N, 1/2”-' is a normalization factor to 
make dgr(mi1,mz2) € (0, 1], and dw (BI(X), BIg(X)) 
is Wassertein’s distance [23] between belief intervals 
BL,(X) * [Bel,(X), Pli(X)| = [a1,b:] and B(x) £ 
[Belg(X), Plo(X)] = [a2, ba]. Here, dj, (BI, (X), BI2(X)) 
entering in (14) is given by 

| 


bz — ag : 
aS . (15) 


Figure 2 shows the normalized histogram (i.e. the 
empirical probability distribution) of the distance values 
d= .(mpcrs,m) based on 20000 random! generations of 
dichotomous BBAs m, and mg. One observes that the new 
fusion approach based on the canonical decompositions of 
BBAs (with the conjunctive fusion of pro-evidences, and the 
conjunctive fusion of contra-evidences) provides a solution 
which is very close to what we obtain from the direct ap- 
plication of PCRS rule, with a mean of 0.0287 and a standard 
deviation of 0.0289. In 98.20% of cases, the final decision 
(based on the min of d%, decision-making strategy explained 
in [22]) based on mpcrs, or on m are in agreement. This 
means that the decision agreement (DA) rate is 98.20%. 

Figures 3 show the normalized histograms of the 
d,(mpcre,m) values based also on 20000 random runs 
for the fusion of 6 dichotomous BBAs respectively. We use 


ay +b, 
2 
1 


3 


a2 + be 
2 


A 


diy ([a1, bi], [a2, ba]) 


by — a, 
2 


'>For this, we generate three random numbers uniformly distributed in [0, 1] 
and we normalize them to generate randomly a dichotomous BBA. 
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Figure 2. Normalized histogram of de 7(™pcrs,m) for 2 dichotomous 
BBAs case (20000 runs). 


PCR6 rule instead of PCR5 rule to combine the 6 dichotomous 
BBAs altogether because PCR6 rule has been recognized to 
be more effective than PCRS in applications [5] (Vol.2 - Chap. 
2). As we can observe, the shape of the histograms is a bit 
different from the histogram of fig. 2, but what matters is that 
the mean value and the standard deviation of the az , distance 
are still low (0.1119 and 0.0392 respectively) indicating that 
the approximation obtained by this new fusion method is 
globally very good. Also the decision based on this new 
fusion approach is globally coherent with the decision taken 
by the direct PCR6 fusion of the BBAs (95.84% of decision 
coherence). 


Normalized histogram of d,,(m, 
T T 1 


pcre) - case with 6 BBAS 


03 
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Figure 3. Normalized histogram of dz 7(™pcre6,™M) for 6 dichotomous 
BBAs case (20000 runs). 


Several Monte Carlo simulations have been done with 
different numbers of dichotomous BBAs to combine. The 
results obtained based on 20000 runs Monte Carlo simulations 
are summarized in the Table I. 

The second column of Table I indicates the mean value, 
denoted by mean(d# yp)» Of the normalized Euclidean belief 
Interval distance between the direct fusion of the BBAs by 
the PCRS (when combining 2 BBAs only), or PCR6 rule 
(when combining more than two BBAs) and the new fusion 
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0.0287 0.0289 
0.0578 0.0373 
0.0838 0.0394 
0.1008 0.0397 
0.1119 0.0392 
0.1169 0.0385 
0.1200 0.0374 
0.1211 0.0365 
0.1204 0.0348 


Table I 
COMPARATIVE EVALUATION OF CANONICAL DECOMPOSITION FUSION 
METHOD W.R.T. THE DIRECT PCR-BASED FUSION METHOD. 


2 
3 
4 
i] 
6 
i 
8 
9 
10 


rule based on their canonical decomposition. The third column 
of the Table I shows the corresponding standard deviation 
values denoted by std(d& yp The last column indicates the 
decision agreement (DA) factor between the decision taken 
from the direct fusion method, and the indirect (canonical 
decomposition based) method. As we can see, the DA factors 
are very high which means that most of the time the decisions 
taken from the direct fusion method and from the indirect 
fusion method are the same. 

After a deep analysis of our simulation results, one can 
attest that the decision-making disagreement occurs when the 
numerical values of the mass of A and the mass of A are very 
close. This indicates a very high ambiguity in the decision to 
take in such situation which can be easily tracked in practice 
by evaluating the quality indicator of the decision-making - 
see [22] for details. 

In this paper we did not investigate the quality of the 
approximation of the fusion result based on this canonical 
decomposition when replacing the PCR5 fusion step of mp 
and m, by other rules of combination because the core of the 
canonical decomposition is based on PCRS. 


C. Computational time of the new fusion method 


Because of very high combinatorial complexity (and thus 
high computational time) required for applying the direct 
PCR6 fusion of many BBAs, we did only make the perfor- 
mance evaluation up to the fusion of ten BBAs only with 
PCR6. We conjecture that the performances of this new fusion 
method based on canonical decomposition will very slowly 
degrade with the increase of the number of BBAs involved in 
the fusion process. Of course the new fusion method based on 
this canonical decomposition does not suffer of combinatorial 
complexity limitation which is of great interest in some 
applications (like in multi-spectral imagery for detection and 
classification) because many (hundreds or even thousands) of 
dichotomous BBAs could be easily combined very quickly. 
Actually with this method what takes a bit time is only the 
canonical decomposition done by the numerical solver!®. 

Figure 4 shows the average (based on 50 random runs) 
computational time (in seconds) of the direct PCR6 fusion 
of the BBAs altogether (red plot), and the average compu- 
tational time of the new fusion method based on canonical 
decomposition (blue plot). It is clear that the computational 


We did use Matlab™ fsolve function for this. 
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time of the direct PCR6 fusion method (the red curve) grows 
exponentially with the number of sources, whereas the com- 
putational time grows only slowly and quasi-linearly with the 
new method proposed in this work. 


Comparison of computational times (based on 50 random runs) 
T T T T T T T 


with the direct POR6 fusion of all BBAS 
with the fusion based on the canonical decompositions of BBAS 


‘Averaged Time (in seconds) 
T 


0 = 
2 3 4 5 6 7 8 9 
Number of dichotomous BBAs to combine 


Figure 4. Computing time versus number of BBAs to combine. 


Based on a set of 1000 random dichotomous BBAs, figure 
5 shows that the computational time (in seconds) of the 
fusion based on the canonical decomposition is a quasi-linear!’ 


function of the number of BBAs to combine. 
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Figure 5. Computing time versus number of BBAs to combine. 


Figures 4 and 5 show the computational times including 
the canonical decomposition itself done on the fly. Of course, 
the canonical decomposition could be done off-line once for 
all and stored in the computer memory (if necessary) - see 
for instance the (x,y) values given in [7] for convenience. 
If we have n dichotomous BBAs to combine, we have to 
make their canonical decomposition at first and because the n 


'Tt is not strictly linear because the time for the numerical fsolve search 
of pro-evidence and contra-evidence factors for making the canonical decom- 
position is not constant. 
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pro-evidence BBAs to combine (resp. contra evidence BBAs) 
have a very simple structure their conjunctive fusion m,(A) 
is obtained very quickly by the direct product of n real 
numbers, that is mp,(A) = []j_, ™mp,i(A), and we need 
also a subtraction because m,(AU A) = 1— m,(A). The 
complexity of this fast suboptimal PCR fusion approach (once 
the canonical decomposition is available) is therefore 2(n — 1) 
multiplications and 2 subtractions for making the conjunctive 
fusion of mp; and the conjunctive fusion of m,,;, and 7 
additions and 5 multiplications for making the PCRS fusion of 
mM, with m,. There is no need to use the commonality function 
or the Smets canonical decomposition to make the fusion of 
these dichotomous BBAs. These figures show clearly the real 
advantage of the fusion of dichotomous BBAs based on their 
canonical decompositions in term of computational time, and 
that is why we can say that the new proposed method is really 
a fast fusion method with respect to the direct PCR5 or PCR6 
rule of combination when working with a dichotomous frame 
of discernment. 


D. On the fusion of pro and contra evidences 


In the previous analysis, we did use the conjunctive rule for 
the intermediary fusion step of pro-evidences in one hand, and 
the intermediary fusion step of contra-evidences in the other 
hand. It is worth noting that the intermediary step of fusion of 
pro-evidences, and the intermediary step of fusion of contra- 
evidences can be done in parallel which offers a computational 
advantage with respect to the direct fusion method (if one 
has many sources to combine in a specific application). This 
parallelization cannot be achieved in general with the other 
existing rules of combination of evidences. 

Because of the fusion principle depicted in Figure 1, this 
new fusion method offers also the possibility (if one prefers 
for some own specific reasons) of selecting other fusion rules 
for the intermediary fusion steps for combining the pro- 
evidences, and the contra-evidences. Of course the choice of 
the fusion rules used for the combination of pro-evidences and 
the combination of contra-evidences impacts the final result, 
but depending on the type of rules chosen we can obtain an 
associative rule, an idempotent rule, and even a new cautious 
rule. For example, let’s consider the same type of fusion rule 
for combining the pro-evidences mp, s = 1,...,5, and for 
combining contra-evidences mc-,, 5 = 1,...,5 and consider 
the following cases: 

1) If we use the conjunctive rule [5] (Vol. 1), denoted by 

Conj(.,...,-) (as we did previously in our Monte Carlo 
simulations for histogram plots), then 


Mp = Conj(Mpa,-- 


and one has mp(AU A) = iad — xs) and m,(A) 
1—m,(AU A). Because the conjunctive rule is associa- 
tive the fusion of pro-evidences can be done sequentially. 
Similarly, for the fusion of contra-evidences using the 
conjunctive rule one has m.(AUA) = ilo (1—ys) and 
m,(A) =1—m,(AU A). Because there is no conflict 


between the pro-evidences (resp. contra-evidences), the 


: ;Mp,S); 
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fusion result of the pro-evidences (resp. the contra- 
evidences) by PCRS (or PCR6) rules is equivalent to the 
conjunctive fusion result. The conjunctive rule however 
is not idempotent in general but in very specific cases 
where only one focal element gets all the mass of belief. 
If we prefer to use the averaging rule, then we will 
have m,(A) = ae zs and mp(A U A) 1- 
my (A) = § Lent Ys 


2) 


= 3 Dea (1 — #5), and me(A) 
and m,(AU A) = 1—m,(A) = 43°7,(1 — ys). 
Because the averaging rule is not associative, the se- 
quential fusion of pro-evidences (and contra-evidences) 
is not recommended, however the averaging rule allows 
to get an idempotent fusion rule based on canonical 
decompositions if needed. 

We could also prefer to use the min rule to build a new 
cautious rule of combination which will be associative 
and idempotent. For this, we just have to take m,(A) = 
mins=1,...,5(%s) and m,(AUA) = 1—m,(A). Similarly, 
m,(A) = ming-1,..s(ys) and m.(AUA) = 1—m,(A). 


3) 


V. CONCLUSIONS 


In this research paper, we did propose a new fusion 
method to combine very quickly many BBAs defined on a 
dichotomous frame of discernment thanks to their unique 
canonical decompositions. This new interesting method can 
be parallelized and offers the advantage to have a quasi-linear 
computational time with the number of sources. For now, this 
method is limited to the fusion of many BBAs that are defined 
on a simple (dichotomous) frame of discernment. After some 
unsuccessful attempts, it appears that the development of a fast 
fusion method based on the canonical decomposition principle 
for working with non-dichotomous frames of discernment is 
actually a very difficult problem that we want to address to the 
scientific community working with belief functions as a future 
research challenge. This very new method brings already a 
significant benefit for real application involving inter-criteria 
analysis for the evaluation of multiple-objective ant colony op- 
timization algorithm for wireless sensor networks deployment 
that should be reported in a forthcoming publication. 
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Abstract—In this paper, we present a fast Belief Function 
based Inter-Criteria Analysis (BF-ICrA) method based on the 
canonical decomposition of basic belief assignments defined on 
a dichotomous frame of discernment. This new method is then 
applied for evaluating the Multiple-Objective Ant Colony Op- 
timization (MO-ACO) algorithm for Wireless Sensor Networks 
(WSN) deployment. 


Keywords: Inter-Criteria Analysis, belief functions, informa- 
tion fusion, canonical decomposition, PCR5 rule. 


I. INTRODUCTION 


In our previous work [1] we propose a new and improved 
version of classical Atanassov’s InterCriteria Analysis (ICrA) 
[2]-[4] approach based on Belief Functions (BF-ICrA). This 
method proposes a better construction of Inter-Criteria Matrix 
that fully exploits all the information of the score matrix, and 
the closeness measure of agreement between criteria based 
on belief interval distance. In [5], we show how the fusion 
of many sources of evidences represented by Basic Belief 
Assignments (BBAs) defined on a same dichotomous frame 
of discernment can be fast and easily done thanks to the Pro- 
portional Conflict Redistribution rule no.5 (PCR5) and based 
on canonical decomposition of the BBAs, proposed recently 
in [6]. In the recent paper we consider BF-ICrA based on this 
promising technique. Then we show how to apply it for the 
evaluation of the Multiple-Objective Ant Colony Optimization 
(MO-ACO) algorithm for Wireless Sensor Networks (WSN) 
deployment. After a condensed presentation of basics of belief 
functions in Section II, including the short description of 
canonical decomposition of dichotomous BBAs approach, and 
the main steps of fast fusion method of dichotomous BBAs, 
in Section III the BF-ICrA method is described and analyzed. 
Section IV is devoted to the multi-objective ACO algorithm. 
In Section V the results of the fast BF-ICrA method with the 
MO-ACO algorithm for WSN layout deployment is presented 
and discussed. Conclusion is given in Section VI. 


II. BASICS OF BELIEF FUNCTIONS 


A. Basic definitions 


Belief functions (BF) have been introduced by Shafer in 
[7] to model epistemic uncertainty and to combine distinct 
sources of evidence thanks to Dempster’s rule of combination. 
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In Shafer’s framework, we assume that the answer’ of the 
problem under concern belongs to a known finite discrete 
frame of discernment (FoD) O = {61,09,...,0n}, with 
n > 1, and where all elements of © are mutually exclusive 
and exhaustive. The set of all subsets of © (including empty 
set () and ©) is the power-set of © denoted by 2°. A proper 
Basic Belief Assignment (BBA) associated with a given source 
of evidence is defined [7] as a mapping m/(-) : 2° — [0,1] 
satisfying m(0) = 0 and S$) 4-96 m(A) = 1. The quantity 
m(A) is called the mass of A committed by the source of 
evidence. Belief and plausibility functions are respectively 
defined from a proper BBA m/(-) by 


Bel(A)=  S> m(B), (1) 
Be2°|BCA 
and 
P(A)= 5°) m(B) =1-Bel(A), (2) 


Be2°|ANBAO 
where A is the complement of A in @. 


Bel(A) and PI(A) are usually interpreted respectively as 
lower and upper bounds of an unknown (subjective) probabil- 
ity measure P(A). The quantities m(-) and Bel(-) are one- 
to-one and linked by the Mobius inverse formula (see [7], p. 
39). A is called a Focal Element (FE) of m/(-) if m(A) > 0. 
When all focal elements are singletons, m(-) is called a 
Bayesian BBA [7] and its corresponding Bel(-) function is 
equal to Pl(-) and they are homogeneous to a (subjective) 
probability measure P(-). The vacuous BBA, representing 
a totally ignorant source, is defined as m,(O) = 1. A 
dichotomous BBA is a BBA defined on a FoD which has only 
two proper subsets, for instance @ = { A, A} with A 4 © and 
A#. A dogmatic BBA is a BBA such that m(O) = 0. If 
m(O) > 0 the BBA m(-) is nondogmatic. A simple BBA is 
a BBA that has at most two focal sets and one of them is O. 
A dichotomous non dogmatic mass of belief is a BBA having 
three focal elements A, A and AU A with A and A subsets 
of O. 

In his Mathematical Theory of Evidence [7], Shafer pro- 
posed to combine s > 2 distinct sources of evidence repre- 
sented by BBAs with Dempster’s rule (i.e. the normalized con- 
junctive rule), which unfortunately behaves counterintuitively 


lice. the solution, or the decision to take. 
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both in high and low conflicting situations as reported in [8]- 
[11]. In our previous works (see [12], Vol. 2 and Vol. 3 for full 
justification and examples) we did propose new rules of combi- 
nation based on different Proportional Conflict Redistribution 
(PCR) principles, and we have shown the interest of the PCR 
rule No 5 (PCR5) for combining two BBAs, and PCR rule 
No 6 (PCR6) for combining more than two BBAs altogether 
[12], Vol. 2. PCR6 coincides with PCR5 when one combines 
two sources. The difference between PCR5 and PCR6 lies in 
the way the proportional conflict redistribution is done as soon 
as three (or more) sources are involved in the fusion. PCR5 
transfers the conflicting mass only to the elements involved in 
the conflict and proportionally to their individual masses, so 
that the specificity of the information is entirely preserved in 
this fusion process. 

The general (complicate) formulas for PCR5 and PCR6 
rules are given in [12], Vol. 2. The fusion of two BBAs based 
on PCR5 (or PCR6) rule which will be use for canonical 
decomposition of a dichotomous BBA is obtained by the 
formula 


mpcrs(X) = = m1(X1)mo2(X2)+ 
ama XPma(Xe) ,_ma(X)Pma(Xa) 
xen TX) + ma(Xa) m2(X) +mi(X2)” 
Xonx=6 


where all denominators in (3) are different from zero. If a 
denominator is zero, that fraction is discarded. 

From the implementation point of view, PCR6 is simpler 
to implement than PCR5. For convenience, very basic (not 
optimized) Matlab” codes of PCR5 and PCR6 fusion rules 
can be found in [12], [13] and from the toolboxes repository 
on the web [14]. The main drawback of PCR5 and PCR6 rules 
is their very high combinatorial complexity when the number 
of source is big, as well as the cardinality of the FoD. In this 
case, PCR5 or PCR6 rules cannot be used directly because 
of memory overflow. Even for combining BBAs defined on a 
simple dichotomous FoD as those involved in the Inter-Criteria 
Analysis (ICrA), the computational time for combining more 
than 10 sources can take several hours”. That is why a fast 
fusion method to combine dichotomous BBAs is necessary, 
and we present it in the next subsections. 


B. Canonical decomposition of dichotomous BBA 


A FoD © = {A, A} is called dichotomous if it consists of 
only two proper subsets A and A with AUA = 9 and ANA = 
0, where A is the complement of A in O and A is different 
from © and from Empty-Set. We consider a given proper BBA 
m(-) : 2° — [0,1] of the general form 

m(A) =a, 


m(A)=b, m(AUA)=1-a—b. (4) 


2with a MacBook Pro 2.8 GHz Intel Core i7 with 16 Go 1600 MHz DDR3 
memory running Matlab™ R2018a. 


The canonical decomposition problem consists in finding the 
two following simple proper BBAs m, and m, of the form 


mp,(A) =, 
mc<(A) =%; 
with (x,y) € [0,1] x [0,1], such that m = Fusion(my, me), 
for a chosen rule of combination denoted by Fusion(-,-). The 
simple BBA m,(-) is called the pro-BBA (or pro-evidence) 
of A, and the simple BBA m,(-) the contra-BBA (or contra- 
evidence) of A. The BBA m,(-) is interpreted as a source 
of evidence providing an uncertain evidence in favor of A, 
whereas m(-) is interpreted as a source of evidence providing 
an uncertain contrary evidence about A. 
In [6], we have shown that this decomposition is possible 
with Dempster’s rule only if 0 < a < 1,0 < b < 1 and 
a+b <1, and we have x = — andy = > However, 


1—b 1l—a’* 
any dogmatic BBA m(A) = a, m(A) = b witha +b=1 
is not decomposable from Dempster’s rule for the case when 
(a,b) # (1,0) and (a,b) ¥ (0,1), and the dogmatic BBAs 
m(A) = 1, m(A) = 0, or m(A) = 0, m(A) = 1 have 
infinitely many decompositions based on Dempster’s rule of 
combination. We have also proved that this canonical decom- 
position cannot be done from conjunctive, disjunctive, Yager’s 
[15] or Dubois-Prade [16] rules of combination, neither from 
the averaging rule. The main result of [6] is that this canonical 
decomposition is unique and is always possible in all cases 
using the PCR5 rule of combination. This is very useful to 
implement a fast efficient approximating fusion method of 
dichotomous BBAs as presented in details in [5]. We recall 


the following two important theorems proved in [6]. 


m,(AU A) =1~-2, (5) 
m(AU A) =1-y, (6) 


Theorem 1: Consider a dichotomous FoD © = {A, A} with 
A# Oand A ¥ J anda nondogmatic BBA m(-) : 2° — [0,1] 
defined on © by m(A) = a, m(A) = b, and m(AU A) = 
1—a-—b, where a,b € [0,1] and a+b < 1. Then the BBA 
m/(-) has a unique canonical decomposition using PCR5 rule 
of combination of the form m = PC'R5(mp,m-) with pro- 
evidence m,(A) = 2, mp,(AUA) = 1—2 and contra-evidence 
me(A) = y, m-(AU A) = 1—y where x, y € [0,1]. 


Theorem 2: Any dogmatic BBA defined by m(A) = a and 
m(A) = b, where a,b € [0,1] and a + b = 1, has a canonical 
decomposition using PCR5 rule of combination of the form 
m = PCR5(mp,m-) with mp(A) =z, m,(AU A) =1-2 
and m,.(A) = y, m-(AU A) = 1— y where 2, y € [0, 1]. 
Theorems 1 & 2 prove that the decomposition based on 


PCR5 always exists and it is unique for any dichotomous 
(nondogmatic, or dogmatic) BBA. 


For the case of dichotomous nondogmatic BBA considered 
in Theorem 1, one has to find x and y solutions of the system 


x? av? + ry — ry? 
a=a(l-y)+—4 == 
c+y cr+y 

xy? y? + ay — x7y 
b= -2)g+— = —__ —__, (8) 
c+y c+y 
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under the constraints (a,b) € [0,1]?, and 0 < a+b <1. 
The explicit expression of x and y are difficult to obtain 
analytically (even with modern symbolic computing systems 
like Mathematica™ , or Maple™ ) because one has a quartic 
equation to solve whose general analytical expression of its 
solutions is very complicate. Fortunately, the solutions can be 
easily calculated numerically by these computing systems, and 
even with Matlab” system (thanks to the fsolve function) as 
soon as the numerical values are committed to a and to b, and 
this is what we use in our simulations. 


C. Fast Fusion of dichotomous BBAs 


The main idea for making the fast fusion of dichotomous 
BBAs m,(.), for s = 1,2,...,S defined on the same FoD 0 
is based on the three following main steps: 


1) In the first step, one decomposes canonically each di- 
chotomous BBA m,(-) into its pro and contra evidences 
Mops = (™p,s(A), Mp,s(A), Mp,s(AU A)) = (4,9, 1- 
Xs) and Meg = (Me,s(A), Me,s(A), Me,s(A U A)) = 
(0, ys, 1 — ys) 

In the second step, one combines the pro-evidences 
My,s for s = 1,2,...,S altogether to get a global 
pro-evidence mp, and in parallel one combines all the 
contra-evidences m.¢,, for s = 1,2,...,S altogether to 
get a global contra-evidence m,. The fusion step of pro 
and contra evidences is based on conjunctive rule of 
combination; 

Once m, and me, are calculated, then one combines 
them with PCR5 fusion rule to get the final result. 


Because the PCRS rule of combination is not associative, the 
fusion of the canonical BBAs followed by their PCR5 fusion 
will not provide in general the same result as the direct fusion 
of the dichotomous BBAs altogether but only an approximate 
result, which is normal. However, this new fusion approach 
is interesting because the fusion of the pro-evidence mys 
(resp. contra-evidences ™m¢,,) is very simple because there is 
no conflict between mp, (resp. between mc,s), so that their 
fusion can be done quite easily and a large number of sources 
can be combined without a high computational burden. In fact, 
with this fusion approach, only one PCR5 fusion step of simple 
(combined) canonical BBAs is needed at the very end of the 
fusion process. In [5], we have proved with a Monte-Carlo 
simulation analysis that the approximation obtained by this 
new fusion method based on the fusion of pro-evidences and 
contra-evidences with respect to the direct fusion of the BBAs 
with PCR5 (or PCR6 when considering more than two sources 
to combine) is effective because the agreement between the 
decision taken from the direct fusion method, and the indirect 
(canonical decomposition based) method is very good. This 
new fusion method based on this canonical decomposition 
does not suffer of combinatorial complexity limitation which is 
of great interest in some applications because many (hundreds 
or even thousands) of dichotomous BBAs could be easily 
combined very quickly. Actually with this method what takes 
a bit time is only the canonical decomposition done by the 


2) 


3) 
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numerical solver. Our analysis [5] has shown that complexity 
of this fast approach is quasi-linear with the number of sources 
to combine. 


III. THE BF-ICRA METHOD 


In [1], we did present an improved version of Atanassov’s 
Inter-Criteria Analysis (ICrA) method [2]-[4] based on belief 
functions. This new method has been named BF-ICrA (Belief 
Function based Inter-Criteria Analysis) for short. It has already 
been applied to GPS surveying problems in [17]. We present 
briefly in this section the principles of BF-ICrA. 

BF-ICrA starts with the construction of an M x N BBA 
matrix M = [m,,(-)] from the score matrix S = [5;,]. The 
BBA matrix M is obtained as follows - see [18] for details 
and justification. 


mij (Ai) = Belij(Ai), (9) 
Mig (A;) => Bel;,;( A;) => 1 = Pl;;(Ai), (10) 
where? 
Bel;; (Ai) = Sup; (Ai)/Abiaxs (12) 
Bel;;(Ai) © Inf;(Ai)/Abins (13) 
with 
Sup;(A;) = Me |Si3 — Sryl, (14) 
kE{1,...,M}|Skj<Si5 
Inf;(Ai) = - S- [Siz — Sk,l, (15) 
ke{1,...,.M}|Sp5>Sij 
and 
FO = max Sup; (Aj), (16) 
Al in © min Inf; (As). (17) 


For another criterion Cj and the j’-th column of the score 
matrix we will obtain another set of BBA values mj,,(-). 
Applying this method for each column of the score matrix we 
are able to compute the BBA matrix M = [m,,(-)] whose each 
component is in fact a triplet (mi;(Ai), mij(Ai), mij(Ai U 
A;)) of BBA values in [0,1] such that mj;(A;) + mij(Ai) + 
mj,;(A; UA;)) =1 forall i=1,...,M andj =1,...,N. 

The next step of BF-ICrA approach is the construction of 
the N x N Inter-Criteria Matrix K = [K,,] from M x N 
BBA matrix M = [m,;(-)] where elements K,,, corresponds 
to the BBA (mj,j’(0),mj;;-(8), mj;"(0 U @)) about positive 
consonance @, negative consonance 6 and uncertainty between 
criteria C'; and Cj, respectively. The construction of the triplet 
Ky) = (my (0), M45! (0), M45! (OU 0)) is based on two steps: 


« Step 1 (BBA construction): Getting m’,,,(.). 


For each alternative A; for i=1,...,M, we first 
compute the BBA (m‘,,(8),mi,,(8),mi.(@ U @)) for 


3assuming that Abax # O and Al in #~ 0. If Abvax = O then 
Belj; (Aj) = 0, and if A} in = 0 then Pl; (Aj) i 
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any two criteria j,7’ € {1,2,...,N}. For this, we 
consider two sources of evidences (SoE) indexed 
by j and 7’ providing the BBA mj; and mij 
defined on the simple FoD {Aj;,A;} and denoted 
mij = [mij(Ai), mig (Ai), mig (Ai U A;)| and M45" = 


[Mi;" (Aj), Mj! (Aj), Miz! (A;UA;)]. We also denote 0 = 
{0,0} the FoD about the relative state of the two SoE, 
where § means that the two SoE agree, 9 means that they 
disagree and 0 U 0 means that we don’t know. Hence, 
two SoE are in total agreement if both commit their 
maximum belief mass to the same element A; or to 
the same element A;. Similarly, two SoE are in total 
disagreement if each one commits its maximum mass 
of belief to one element and the other to its opposite, 
that is if one has mj;(A;) = 1 and mjj(A;) = 1, or 
if m,j(Ai) = 1 and m,j(A;) = 1. Based on this very 
simple and natural principle, one can now compute the 
belief masses as follows: 


mi (9) = mij (Ai)miyr (Ai) + mig(A)mige (A), (18) 
Myr (9) tl (As )migr (Ai) + mij (Ai)miyy (A;), (19) 
Mii (@U6)=1— Mii (@) — M541 (0). (20) 


m',,,(@) represents the degree of agreement between the 
BBA mjj;(-) and mij;/(-) for the alternative A;, m‘,/(@) 
represents the degree of disagreement of the two BBAs 
and mi. (0 U 0) the level of uncertainty (i.e. how much 
we don’t know if they agree or disagree). By construction 
mgr) = M5rj), M5 (8), M55 (8), M55 (8UB) € [0,1] 
and mi(0) + mi,(@) + mi,,(@U 8) = 1. This BBA 
modeling permits to build a set of M symmetrical 
Inter-Criteria Belief Matrices (ICBM) K’ = [K},,] of 
dimension N x WN relative to each alternative A; whose 
components Ki jt correspond to the triplet of BBA values 
Migr = (Mi (O), m5 5-(A), m5 ;(8 U @)) modeling the 
belief of agreement and of disagreement between C; and 
Cy based on Aj. 


Step 2 (fusion): Getting mj; (.). 


In this step, one needs to combine the BBAs mj;,(.) for 
i=1,...,M altogether to get the component Kj; = 
(1m ;"(8), m 4” (0), mj" (8U8)) of the Inter-Criteria Belief 
matrix* (ICBM) K = [jj]. For this and from the 
theoretical standpoint, we recommend to use the PCR6 
fusion rule [12] (Vol. 3) because of known deficiencies 
of Dempster’s rule. 

Once the global Inter-Criteria Belief Matrix (ICBM) K = 
[F557 = (m5 (8), m55" (0), m55" (0 U 6))] is calculated, we 
can identify the criteria that are in strong agreement, in 
strong disagreement, and those on which we are uncertain. 
For identifying the criteria that are in strong agreement, we 
evaluate the distance of each component of A’; ;, with the BBA 


‘Ror the presentation convenience, the ICBM K, with K 
[F557 = (457 (8), m5 57(8), m5 57(@ UA))], is decomposed into three ma- 
trices K(0) = [K¥j, = myjv(8)], KO) = [Kj, = mj;-(8)] and 
K(0U8) = [K95? = 1 — m5 (8) — mj, (8))- 
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representing the best agreement state and characterized by the 
specific BBA® my(@) = 1. From a similar approach we can 
also identify, if we want, the criteria that are in very strong 
disagreement using the distance of m,,-(-) with respect to the 
BBA representing the best disagreement state characterized 
by the specific BBA mp(0) = 1. We use the belief interval 
distance dgy(m1, mz) presented in [19] for measuring the 
distance between the two BBAs. 


A. Fast BF-ICrA method 


The computational complexity of BF-ICrA is of course 
higher than the complexity of ICrA because it makes a more 
precise evaluation of local and global inter-criteria belief 
matrices with respect to inter-criteria matrices calculated by 
Atanassov’s ICrA. The overall reduction of the computational 
burden of the original MCDM problem thanks to BF-ICrA 
depends highly on the problem under concern, the complexity 
and cost to evaluate each criteria involved in it, as well as the 
number of redundant criteria identified by BF-ICrA method. 

The main drawback of BF-ICrA method is the PCR6 
combination required in its step 2 for combining altogether 
the dichotomous BBAs m', j’(.). Because of combinatorial 
complexity of PCR6 rule, it cannot work in reasonable com- 
putational time as soon as the number of sources to combine 
altogether is greater than 10, which prevents its use for solving 
ICrA problems involving more than 10 alternatives (as in 
the examples 2 and 3 presented in section V). That is why 
it is necessary to adapt the original BF-ICrA method for 
working with a large number of alternatives and criteria. For 
this, we can in step 2 of BF-ICrA exploit the method for 
the fast fusion of dichotomous BBAs presented in section 
II-C. More precisely, each dichotomous BBA mi (.) will be 
canonically decomposed in its pro-evidence m; j/p(-) and its 
contra-evidence m’.,, ,(.) that will be combined separately to 
get the global pro-evidence m,,/»(.) and the global contra- 
evidence mj,’ ,c(.). Then, the BBAs mj p(.) and mj,’,c(.) 
are combined with PCRS rule to get the BBAs m,,-(.) and, 
finally, the global Inter-Criteria Belief Matrix K = [Kj,;) = 
(m8), m7” (0), m;;-(@ U @))]. The principle of this mod- 
ified step 2 of BF-ICrA is summarized in the Figure 1 for 
convenience. 

Another simpler fusion method to combine the dichotomous 
BBAs m‘.,,(.) would just consist to average them. In section V, 
we will show how these two methods behave in the examples 
chosen for the evaluation of MO-ACO Algorithm for optimal 
WSN deployment. 


IV. MULTI-OBJECTIVE ACO ALGORITHM 


Recently Wireless Sensor Networks (WSNs) have attracted 
the attention of the research scientists community, conditioned 
by a set of challenges: theoretical and practical. WSNs consists 
of distributed sensor nodes and their main purpose is to 
monitor the real-time environmental status, based on gathering 
available sensor information, processing and transmitting the 


5We use the index T' in the notation m(-) to refer that the agreement is 
true, and F in mp(-) to specify that the agreement is false. 
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Figure 1. Principle of fast fusion of mir (.) of Step 2 of BF-ICrA. 


collected data to the specified remote base station. It is a 
promising technology that is used in a coverage of application 
requiring minimum human contribution, ranging from civil 
and military to healthcare and environmental monitoring. One 
of the key mission of WSN is the full surveillance of the moni- 
toring region with a minimal number of sensors and minimized 
energy consumption of the network. The lifetime of the sensors 
is strongly related to the amount of the power loaded in the 
battery, that is why the control of the energy consumption 
of sensors is an important active research problem. The small 
energy storage capacity of sensor nodes intrudes the possibility 
to gather the information directly to the main base. Because 
of this they transfer their data to the so called High Energy 
Communication Node (HECN), which is able to collect the 
information from across the network and to transmit it to the 
base computer for processing. The sensors transmit their data 
to the HECN, either directly or via hops, using closest sensors 
as communication relays. The WSN can have large numbers 
of nodes and the problem can be very complex. 

In order to solve successfully the key mission of WSNs, in 
[20], we did apply multi-objective Ant Colony Optimization 
(ACO) to solve this hard, from the computational point of 
view, telecommunication problem. The number of ants is 
one of the key algorithm parameters in the ACO and it is 
important to find the optimal number of ants needed to achieve 
good solutions with minimal computational resources. In [20], 
the optimal solution was obtained by applying the classical 
Atanassov’s ICrA method. In the next section we will present 
the results obtained by the fast BF-ICrA approach and compare 
their results. 

The problem of designing a WSN is multi-objective, with 
two objective functions: 1) one wants to minimize the energy 
consumption of the nodes in the network, and 2) one wants 
to minimize the number of nodes. The full coverage of the 
network and connectivity are considered as constraints. For 
solving this problem, we have proposed to use a Mullti- 
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Objective Ant Colony Optimization (MO-ACO) algorithm in 
[20] and we have studied the influence of the number of ants 
on the algorithm performance and quality of the achieved solu- 
tions. The computational resources, which the algorithm needs, 
are not negligible. The computational resources depends on the 
size of the solved problem and on the number of ants. The aim 
is to find a minimal number of ants which allow the algorithm 
to find good solution for WSN deployment. 

The ACO algorithm uses a colony of artificial ants that 
behave as cooperating agents. With the help of the pheromone 
and the heuristic information they try to construct better solu- 
tions and to find the optimal ones. The pheromone corresponds 
to the global memory of the ants and the heuristic information 
is a some preliminary knowledge of the problem. The problem 
is represented by a graph and the solution is represented by 
a path in the graph or by tree in the graph. Ants start from 
random nodes and construct feasible solutions. When all ants 
construct their solution the pheromone is updated. The new, 
added, pheromone depends to the quality of the solution. The 
elements of the graph, which belong to better solutions will 
receive more pheromone and will be more desirable in the 
next iteration. In our implementation, we use the MAX-MIN 
Ant System (MMAS) which is one of the most successful 
ant approaches originally presented in [21]. In our case, the 
graph of the problem is represented by a square grid. The 
nodes of the graph are enumerated. The ants will deposit 
their pheromone on the nodes of the grid. We will deposit 
the sensors on the nodes of the grid too. The solution is 
represented by tree. An ant starts to create a solution starting 
from random node, which communicates with the HECN. 
Construction of the heuristic information is a crucial point 
in the ant algorithms. Our heuristic information represented 
by (21) is a product of three values. 


where s;; is the number of the new points (nodes of the 
graph) which the new sensor will cover, and which are not 
covered by other sensors, and 


u{ 


and where 6;; is the solution matrix. The matrix element 
bi; equals 1 when there is sensor on this position, otherwise 
bi; = 0. With s;;, we try to increase the number of points 
covered by one sensor and thus to decrease the number of 
sensors we need. With 1;;, we guarantee that all sensors 
will be connected. With b;; we guarantee that maximum one 
sensor will be mapped on the same point. The search stops 
when transition probability p;; = 0 for all values of i and 
j. It means that there are no more free positions, or that 
all area is fully covered. At the end of every iteration the 
quantity of the pheromone is updated according to the rule: 
Tig — PTij + ATi; with the increment Ati; = 1/F(k) if 
(i,j) belongs to the non-dominated solution constructed by 
ant k, or A7;; = 0 otherwise. The parameter p is a pheromone 


(21) 


1 if communication exists ; (22) 
0 if there is no communication. 
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decreasing parameter chosen in [0, 1]. This parameter p models 
evaporation in the nature and decreases the influence of old 
information on the search process. After that, we add the new 
pheromone, which is proportional to the value of the fitness 
function constructed as F(k) = SECON SCHON 
where f1(k) is the number of sensors proposed by the &-th ant, 
and f2(k) is the energy of the solution of the k-th ant. These 
are also the objective functions of the WSN layout problem. 
We normalize the values of two objective functions with their 
maximal achieved values from the first iteration. 


V. APPLICATION OF THE FAST BF-ICRA METHOD 


In this section we present the results of the fast BF- 
ICrA method with the MO-ACO algorithm for WSN layout 
deployment. Fidanova and Roeva have developed a software, 
which realizes the MO-ACO algorithm. This software can 
solve the problem at any rectangular area, the communication 
and the coverage radius can be different and can have any 
positive value. We can have regions in the area. The program 
was written in C language, and the tests were run on computer 
with an Intel Pentium 2.8GHz processor. In their tests, they 
use an example where the area is square. The coverage and 
communication radii cover 30 points. The HECN is fixed in 
the centre of the area. In the sequel we consider three examples 
of areas with three sizes: 350 x 350 points, 500 x 500 points, 
and 700 x 700 points. The MO-ACO algorithm is based on 30 
runs for each number of ants. We extract the Pareto front from 
the solutions of these 30 runs, and we show the achieved non 
dominated solutions (approximate Pareto fronts) for each case 
on which the BF-ICrA will be applied. The score matrices for 
each case is given in Tables I, II and III [20]. 


ACO, ACOg ACO3z ACO4 ACOs 
30 36 30 30. 30 


ACOg ACO7 ACOg ACOg ACOiQ 
30 30 30 30 30 
30 
28 
26 
26 
25 


111 
112 
113 
114 
115 
116 


Table I 
THE 6 X 10 SCORE MATRIX S FOR 350 x 350 CASE (EXAMPLE 1). 


ACO, ACOg ACOg ACOqg ACO5 ACOg ACO7 ACOg ACOg ACO. 
1 2 3 4 5 6 3 8 ‘9 10 


223 7 90 96 90 90 89 81 90 90 90 90 
224 61 96 89 89 88 65 61 59 57 71 
225 61 96 74 58 60 58 57 58 57 57 
226 59 95 73 57 59 57 56 58 57 57 
227 60 57 57 57 57 56 56 57 57 57 
228 60 57 57 57 57 56 56 57 54 57 
229 58 57 57 55 57 56 56 56 54 56 
230 57 57 57 55 57 52 56 54 54 56 
231 57 55 57 55 55 52 56 54 54 56 
232 57 55 55 51 54 50 52 51 54 48 
s- 233 57 55 55 51 54 50 51 51 54 48 
~~ 234 57 55 55 51 53 50 51 48 53 48 
235 57 55 54 51 53 50 51 48 50 48 
236 57 55 54 51 53 50 51 48 50 48 
237 57 55 54 51 53 50 51 48 50 48 
238 57 55 53 51 53 50 51 48 50 48 
239 56 55 53 50 53 50 51 48 50 48 
240 53 53 53 50 53 50 51 48 50 48 
241 53 53 53 50 53 50 51 48 50 48 
242 53 53 53 50 53 50 51 48 50 48 
243 53 53 53 50 53 50 51 48 50 48 
244 L 53 53 53 50 52 50 51 48 50 48 
Table II 
THE 22 x 10 SCORE MATRIX S FOR 500 x 500 CASE (EXAMPLE 2). 


Each row of S corresponds to the number of sensors used in 
WSN to cover the area as indicated in the first column at the 
left side of the score matrix. Each column of S corresponds 
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ACO, ACOg ACO3 ACO, ACOs ACOg ACO7 ACOg ACOg ACO1Q 
437 7173 173 173 4173 4173 118 168 172 261 172 4 
438 | 173 173 173 173 4173 4118 4112 4117 «260 += 172 
439 | 172 173 173 173 140 93 110 4115 131 + 172 
440 | 172 173 173 173 4115 93 110 114 4111 ~#« 162 
441 | 172 173 173 122 111 93 4110 114 111 ~°# 110 
442 | 172 173 173 114 111 93 4110 4112 4111 ~+°# 110 
443 | 172 150 123 114 111 93 4110 4112 4111 ~°# 110 
444 | 124 112 112 106 107 93 110 102 111 105 
445 | 117 112 112 106 107 93 110 102 108 105 

S= 446 | 117 112 105 105 105 93 107 102 104 ~ 105 
447] 117 112 105 105 105 93 105 102 102 105 
448] 115 111 105 105 105 93 105 102 102 4105 
449] 115 111 105 105 105 93 102 99 102 105 
450 | 113. 111 105 105 105 93 102 99 102 105 
451] 113 109 105 105 105 93 102 99 97. 105 
452 | 113 109 105 105 105 93 #99 99 97. 104 
453 | 113 109 105 105 105 93 99 99 97. 104 
454 | 113 109 105 105 96 93 496 96 96 104 
455 L106 106 105 105 96 93 £96 96. 96 97 

Table II 


THE 19 x 10 SCORE MATRIX S FOR 700 x 700 CASE (EXAMPLE 3). 


to ACO, algorithm used with j ants (j = 1,2,...,10). Each 
element 5;; of S corresponds to the energy corresponding to 
this number of sensors and with the number of ants used for 
Multiple Objective ACO algorithm. 


Application of BF-ICrA in example 1 (350 x 350 points) 


In this example, one sees from the score matrix of the 
Table I that ACO,;, ACO3 and ACOg algorithms perform 
equally for all alternatives (ie. all rows) and they define 
a first group/cluster of methods providing exactly the same 
performances. Similarly, ACO1,, ACO; and ACOg constitute a 
second group of algorithms. The third group is made of ACO, 
ACOg and ACOjo algorithms. It is worth noting that these 
three groups {ACO,, ACOs, ACO9}, {ACO4z, ACOs, ACOg¢}, 
and {ACO7, ACOg, ACOj0} differ only very slightly, whereas 
the ACOz algorithm (i.e the 2nd column of the score matrix 
S) differs a bit more from all the three aforementioned groups. 


Example 1 with fast PCR6: If we apply the fast BF-ICrA 
method using approximate PCR6 fusion rule based on the 
canonical decomposition of the 1Z = 6 dichotomous BBAs 
(mi.,(0), mi’, (8), m5,,(9U)), we get the matrix of mass of 
belief of agreement between criteria given in Table® IV. 


pO.865 
0.821 
0.865 
0.790 
0.790 
0.790 
0.806 
0.806 
0.865 
LO.806 


0.821 
0.928 
0.821 
0.950 
0.950 
0.950 
0.805 
0.805 
0.821 
0.805 


0.865 
0.821 
0.865 
0.790 
0.790 
0.790 
0.806 
0.806 
0.865 
0.806 


0.790 
0.950 
0.790 
1.000 
1.000 
1.000 
0.795 
0.795 
0.790 
0.795 


0.790 
0.950 
0.790 
1.000 
1.000 
1.000 
0.795 
0.795 
0.790 
0.795 


Table IV 
MATRIX Kx pcre(O) FOR EXAMPLE 1. 


0.790 
0.950 
0.790 
1.000 
1.000 
1.000 
0.795 
0.795 
0.790 
0.795 


0.806 
0.805 
0.806 
0.795 
0.795 
0.795 
0.843 
0.843 
0.806 
0.843 


0.806 
0.805 
0.806 
0.795 
0.795 
0.795 
0.843 
0.843 
0.806 
0.843 


0.806 
0.805 
0.806 
0.795 
0.795 
0.795 
0.843 
0.843 
0.806 
0.843. 


The matrix of distances to full agreement based on fast BF- 
ICrA method, denoted by Dzpcre(A), is given in Table V. 

In examining the table V, one sees that AC'O1, ACO3 
and AC’O9 are at a small distance 0.134, with respect to 
other algorithms, so that they belong to the same group 
and behave similarly. Same remarks holds for the group 
{ACO4, ACOs;,ACOg} because its inter-distance is zero, 


®All the numerical values presented in the matrices have been truncated at 
their 3rd digit for typesetting convenience. 
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0.134 0.178 0.184 0.209 0.209 0.209 0.193 0.193 0.184 0.193 
0.178 0.071 0.178 0.049 0.049 0.049 0.194 0.194 0.178 0.194 
0.134 0.178 0.134 0.209 0.209 0.209 0.193 0.193 0.134 0.193 
0.209 0.049 0.209 0 t) O 0.204 0.204 0.209 0.204 
0.209 0.049 0.209 Oo 0 O 0.204 0.204 0.209 0.204 
0.209 0.049 0.209 oO C) 0 0.204 0.204 0.209 0.204 
0.193 0.194 0.193 0.204 0.204 0.204 0.156 0.156 0.193 0.156 
0.193 0.194 0.193 0.204 0.204 0.204 0.156 0.156 0.193 0.156 
0.134 0.178 0.134 0.209 0.209 0.209 0.193 0.193 0.134 0.193 
0.193 0.194 0.193 0.204 0.204 0.204 0.156 0.156 0.193 0.156 
Table V 


MATRIX Dx pcre () WITH FAST BF-ICRA FOR EXAMPLE 1. 


and for the group {ACO7, ACOg, ACOjo} because its inter- 
distance is 0.156. In a relative manner ACO2 appears closer 
to {ACOug, ACOs, ACO¢};, than {ACO,, ACO3, ACOo } or 
{ACO7, ACOs, ACOjio}, which intuitively makes sense when 
comparing directly the columns of the matrix of Table I. 


Example 1 with averaging fusion: The matrix of distances 
to full agreement based on BF-ICrA method using average 
fusion rule, denoted by Daver.(@), is given in Table VI. 


[ote> 0.082 0.084 0.081 0.081 0.081 0.156 0.156 0.084 heed 

0.082 0.030 0.082 0.016 0.016 0.016 0.142 0.142 0.082 0.142 
0.084 0.082 0.084 0.081 0.081 0.081 0.156 0.156 0.084 0.156 
0.081 0.016 0.081 0 0 0 0.138 0.138 0.081 0.138 
0.081 0.016 0.081 0 0 0 0.138 0.138 0.081 0.138 
0.081 0.016 0.081 0 0 0 0.138 0.138 0.081 0.138 
0.156 0.142 0.156 0.138 0.138 0.138 0.198 0.198 0.156 0.198 
0.156 0.142 0.156 0.138 0.138 0.138 0.198 0.198 0.156 0.198 
0.084 0.082 0.084 0.081 0.081 0.081 0.156 0.156 0.084 0.156 
0.156 0.142 0.156 0.138 0.138 0.138 0.198 0.198 0.156 0.198 

Table VI 
MATRIX Dayer, (9) WITH BF-ICRA USING AVERAGING RULE FOR 
EXAMPLE l. 


One sees that only the group {ACO.1, ACOs, ACOg¢} 
can be clearly identified based on the averaging fu- 
sion rule. The other groups ACOz appears also close to 
{ACO,4, ACOs, ACO¢}. But ACO,, ACO3 and ACOg are 
closer to {ACO4, ACOs, ACOg¢} also than in-between. Same 
remarks holds for ACO7, ACOg, and ACOjo. So one sees that 
the averaging fusion rule is not recommended for making the 
BF-ICrA in this example. 


Application of BF-ICrA in example 2 (500 x 500 points) 


Example 2 with fast PCR6: If we apply the fast BF-ICrA 
method using approximate PCR6 fusion rule based on the 
canonical decomposition of the 1M = 22 dichotomous BBAs 
(m‘.,,(0),m’,,,(8),m'..(0U 8)), we get the following matrix 
of distances to full agreement, denoted by Dz pcr6(O), given 
in Table VII. 


p0.158 
0.376 
0.338 
0.300 
0.286 
0.279 
0.247 
0.251 
0.225 
LO.280 


0.376 
0.324 
0.426 
0.456 
0.437 
0.453 
0.457 
0.433 
0.435 
0.449 


0.338 
0.426 
0.407 
0.411 
0.382 
0.423 
0.418 
0.402 
0.393 
0.414 


0.300 
0.456 
0.411 
0.349 
0.323 
0.381 
0.368 
0.370 
0.362 
0.363 


0.286 
0.437 
0.382 
0.323 
0.284 
0.348 
0.334 
0.334 
0.328 
0.333 


Table VII 
MATRIX Dx pcre (9) WITH FAST BF-ICRA FOR EXAMPLE 2. 


0.279 
0.453 
0.423 
0.381 
0.348 
0.316 
0.298 
0.317 
0.308 
0.308 


0.247 
0.457 
0.418 
0.368 
0.334 
0.298 
0.235 
0.276 
0.255 
0.283 


0.251 
0.433 
0.402 
0.370 
0.334 
0.317 
0.276 
0.265 
0.260 
0.303 


0.225 
0.435 
0.393 
0.362 
0.328 
0.308 
0.255 
0.260 
0.211 
0.304 


0.280 
0.449 
0.414 
0.363 
0.333 
0.308 
0.283 
0.303 
0.304 
0.277. 


Based on these results, one sees that no clear group can 
be identified but we emphasize in boldface in Table VII the 
minimal value for each row of the distance matrix Dy pcre (9) 
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(diagonal elements excluded). We see that ACOz2 is at the 
farthest distance of ACO; because Dj2(9) = 0.376, but in 
the mean time ACOz is at closest distance to ACO, because 
D2;(0) > 0.376 (for 7 > 2) as shown in second line of 
Table VII. So we can conclude that ACOz is not close to 
any other algorithm in fact. If we choose a ad-hoc distance 
threshold, say for instance 0.28, then we can identify the group 
{ACO , ACO7, ACOg, ACOg}. 


Example 2 with averaging fusion: The matrix of distances 
to full agreement based on BF-ICrA method using average 
fusion rule, denoted by Daye, (@), is given in Table VIII. 


0.361 0.316 0.310 0.311 0.336 0.300 0.306 0.316 0.320 0.309 
0.316 0.125 0.158 0.198 0.225 0.187 0.216 0.225 0.240 0.206 
0.310 0.158 0.165 0.185 0.215 0.178 0.200 0.215 0.227 0.193 
0.311 0.198 0.185 0.183 0.216 0.181 0.197 0.217 0.231 0.192 
0.336 0.225 0.215 0.216 0.243 0.214 0.231 0.249 0.261 0.226 
0.300 0.187 0.178 0.181 0.214 0.159 0.175 0.194 0.210 0.176 
0.306 0.216 0.200 0.197 0.231 0.175 0.181 0.202 0.216 0.186 
0.316 0.225 0.215 0.217 0.249 0.194 0.202 0.215 0.229 0.204 
0.320 0.240 0.227 0.231 0.261 0.210 0.216 0.229 0.233 0.222 
0.309 0.206 0.193 0.192 0.226 0.176 0.186 0.204 0.222 0.183 
Table VIII 
MATRIX Dayer. (9) WITH BF-ICRA USING AVERAGING RULE FOR 
EXAMPLE 2. 


Based on the average fusion rule there is no clear 
clustering of algorithms. However based on shortest inter- 
distance we could make the following distinct pairwise group- 
ings {ACOg, ACOs}, {ACOg, ACO7}, {ACO,4, ACOjo}, 
{ACOg, ACO, } and {ACO}, ACO5} if necessary, but remem- 
ber that average fusion rule cannot provide the best result as 
shown in Example 1. 


Application of BF-ICrA in example 3 (700 x 700 points) 


Example 3 with fast PCR6: If we apply the fast BF-ICrA 
method using approximate PCR6 fusion rule based on the 
canonical decomposition of the M = 19 dichotomous BBAs 
(mi,,(0),m’,, (8), m',,(9U8)), we get the matrix of distances 
to full agreement, denoted by Dz pcr6(O), given in Table IX. 


0.313 0.388 0.465 0.498 0.469 0.500 0.426 0.451 0.498 0.477 
0.388 0.339 0.403 0.496 0.461 0.500 0.421 0.440 0.497 0.464 
0.465 0.403 0.348 0.493 0.456 0.500 0.416 0.437 0.495 0.457 
0.498 0.496 0.493 0.362 0.385 0.500 0.376 0.391 0.470 0.303 
0.469 0.461 0.456 0.385 0.230 0.380 0.256 0.288 0.300 0.324 
0.500 0.500 0.500 0.500 0.380 0 0.312 0.356 0.308 0.500 
0.426 0.421 0.416 0.376 0.256 0.312 0.137 0.185 0.272 0.330 
0.451 0.440 0.437 0.391 0.288 0.356 0.185 0.205 0.314 0.351 
0.498 0.497 0.495 0.470 0.300 0.308 0.272 0.314 0.283 0.438 
0.477 0.464 0.457 0.303 0.324 0.500 0.330 0.351 0.438 0.228 
Table IX 


MaTRIX Dx pcre (8) WITH FAST BF-ICRA FOR EXAMPLE 3. 


We observe that the average distance between ACO algo- 
rithms is much higher than in Tables V and VII of examples 
1 and 2. This shows clearly the difficulty to precisely identify 
the clusters of similar algorithms because only few ACO 
algorithms perform actually very well for this third example. 
Eventually, and based on shortest inter-distance we could make 
the first pairwise group {ACO7, ACOs} because D7g(0) = 
0.185 is the minimal inter-distance we have between the ACO 
algorithms. Once the rows and columns of Table IX corre- 
sponding to ACO7 and ACOg are eliminated, then the second 
best group will be {ACO;, ACOg} because Ds9(0) = 0.300. 
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Similarly, we will get the group {ACO4, ACOj9} because 
D4,10(@) = 0.303, and then the group {ACO,, ACO2} because 
Dj2(0@) = 0.388. Finally we could also cluster ACO3 with 
ACOg because D3¢6(@) = 0.500, although this distance of 
agreement is quite large to be considered as a trustable cluster. 


Example 3 with averaging fusion: The matrix of distances 
to full agreement based on BF-ICrA method using average 
fusion rule, denoted by Dayer(0), is given in Table X. 


0.170 0.154 0.142 0.221 0.351 0.350 0.392 0.345 0.332 0.298 
0.154 0.120 0.092 0.167 0.321 0.295 0.369 0.313 0.290 0.261 
0.142 0.092 0.042 0.114 0.289 0.237 0.342 0.279 0.242 0.224 
0.221 0.167 0.114 0.054 0.255 0.139 0.327 0.260 0.184 0.177 
0.351 0.321 0.289 0.255 0.339 0.245 0.391 0.355 0.287 0.324 
0.350 0.295 0.237 0.139 0.245 0 0.304 0.242 0.115 0.247 
0.392 0.369 0.342 0.327 0.391 0.304 0.390 0.368 0.336 0.387 
0.345 0.313 0.279 0.260 0.355 0.242 0.368 0.328 0.288 0.341 
0.332 0.290 0.242 0.184 0.287 0.115 0.336 0.288 0.190 0.279 
0.298 0.261 0.224 0.177 0.324 0.247 0.387 0.341 0.279 0.261 
Table X 
MATRIX Dayer. (9) WITH BF-ICRA USING AVERAGING RULE FOR 
EXAMPLE 3. 


Surprisingly, the use of averaging rule provides in this 
example lower distance values on average with respect to 
values given in Table IX. However no clear clustering of 
algorithms can be made because only few ACO algorithms 
perform actually very well for this third example. If we adopt 
the pairwise strategy to cluster algorithms, we will obtain 
now as first group {ACO2, ACO3} because D23(0) = 0.092, 
as second group {ACOg,ACOg} because Dgg(@) = 0.115, 
as third group {ACO4, ACOj9} because D419(0) = 0.177, 
as fourth group {ACO,,ACOg} because D13(0) = 0.345, 
and finally we could also cluster ACO; with ACO7 because 
Ds7(0) = 0.391. one sees that there is no strong correlation 
between results obtained from BF-ICrA based on fast PCR6 
and those based on averaging rule, which is not surprising 
because the rules are totally different. Nevertheless the group 
{ACO4, ACOjo0} is agreed by both methods here. 


VI. CONCLUSIONS 


The fast Belief Function based Inter-Criteria Analysis 
method, using the canonical decomposition of basic belief 
assignments defined on a dichotomous frame of discernment 
was applied, tested and analysed in this paper. This new 
method was applied for evaluating the Multiple-Objective 
Ant Colony Optimization (MO-ACO) algorithm for Wireless 
Sensor Networks (WSN) deployment. Based on the BF-ICrA 
outcomes it was shown a very high correlation with fast 
PCR6 rule for the ACO;, ACO3 and ACOg group, for the 
ACO,, ACO; and ACOg group, and for the ACO7, ACOg 
and ACOjo group of algorithms in example 1 (case of 
size 350 x 350) as intuitively expected. This is because the 
considered ACO algorithms can solve the problem with good 
solution quality in example 1. These high correlations were 
not observed in the other two cases for example 2 (case of 
size 500 x 500) and 3 (case of size 700 x 700) because 
only few ACO algorithms perform actually very well for 
these examples. So, if we considered results in case of larger 
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problem sizes, the BF-ICrA results show that the number of 
ants has the significant influence on the obtained results, as 
already pointed out in [20]. 
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Abstract—This paper discusses and analyzes the behaviors of 
the proportional conflict redistribution rules no. 5 (PCR5) and 
no. 6 (PCR6) to combine several distinct sources of evidence 
characterized by their basic belief assignments defined over the 
same frame of discernment. After a brief review of these rules, 
the paper shows through simple examples why their behaviors 
can sometimes increase the uncertainty more than necessary 
which is detrimental to decision-making support drawn from the 
result of the combination. We present a theoretical improvement 
of these rules, and establish new PCR5* and PCRG™ rules of 
combination. These new rules overcome the weakness of PCR5 
and PCR6 rules by computing binary keeping-indexes that allow 
to keep only focal elements that play an effective role in the 
partial conflict redistribution. PCR5* and PCR6* rules are not 
associative but they preserve the neutrality of the vacuous belief 
assignment contrary to the PCR5 and PCR6 rules, and they make 
a more precise redistribution which does not increase improperly 
the mass of partial uncertainties. 


Keywords: information fusion, belief functions, PCRS‘, 
PCR6*, PCRS, PCR6 fusion rules.. 


I. INTRODUCTION 


There exist different theories based on distinct representa- 
tions and modelings of uncertainty to deal with uncertain infor- 
mation to conduct information fusion [1]. The theory of prob- 
ability [2], [3], the theory of fuzzy sets [4], [5], the possibility 
theory [6], [7], and the theory of belief functions [8]-[10] are 
the most well-known ones. This paper addresses the problem 
of information fusion in the mathematical framework of the 
belief functions introduced by Shafer from Dempster’s works 
[11], [12]. The belief functions are often used in decision- 
making support applications because the experts are generally 
able to express only a belief in a hypothesis (or a set of 
hypotheses) from their partial knowledge, experience and from 
their own perception of the reality. To conduct information 
fusion, we need some efficient rules of combination that are 
able to manage the conflicting sources of evidence (if any), or 
expert opinions expressed in terms of belief functions. Readers 
interested in belief functions can found classical related papers 
in [13] and in the special issue [14] which includes also a list 
of good selected papers. It is worth to mention that the recent 
book of Cuzzolin [15] includes 2137 references, with many 
of them related to belief functions. 
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In this paper, we adopt the notion of conflict introduced 
by Shafer in [8] (p. 65). This notion of conflict is often 
adopted by researchers working with belief functions, as in 
[16] p. 17 for instance, because this notion is quite simple to 
understand. Different definitions and interpretations of conflict 
can be also found in [17]-[27] for readers interested in this 
topic. In this paper, two (or more) sources are said conflict- 
ing if they support incompatible (disjoint, or contradictory) 
hypotheses. We also work with distinct sources of evidences 
that are considered as (cognitively) independent and reliable. 
We do not consider, nor apply discounting techniques of belief 
assessments listed in [14] before combining them to keep the 
presentation and notations as simple as possible’. 

While the conjunctive rule makes it possible to combine 
information between different sources of information by esti- 
mating the level of existing conflict, Dempster-Shafer (DS) 
rule [8], [16] proposes a distribution of this conflict on 
the hypotheses characterized by the sources of information. 
The normalization carried out by the DS rule may however 
be considered counter-intuitive especially when the level of 
conflict between the sources of information is high [28], 
[29], but also in some situations where the level of conflict 
between sources is low as shown in [30] showing a dictatorial 
behavior of DS rule. The Proportional Conflict Redistribution 
rules (PCRS [31] and PCR6 [32], [33]) have been proposed 
to circumvent the problem of the DS rule to make a more 
judicious management of the conflict. 

In this paper, we put forward a flawed behavior of these 
combination rules in some cases attributed to the non- 
neutrality of the vacuous BBA (Basic Belief Assignment), 
and we propose an improvement of these two combination 
rules (denoted by PCR5* and PCR6*) in order to ensure the 
neutrality property of the vacuous BBA. This is achieved by 
discarding specific elements implied in the partial conflict and 
which are not useful for making the conflict redistribution. 

In the PCR rules [32]-[34] one redistributes the product of 
masses of belief of incompatible (i.e. conflicting) elements 
whose intersection is empty only to elements involved in 


‘Of course discounted belief assignments can also be combined by the rules 
presented in this paper. 
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this product and proportionally to their mass of belief. For 
instance, let’s consider two elements A, B of a frame of 
discernment (FoD) with AM B = @, and three basic belief 
assignments ™m1(-), ™m2(-), ms3(-) defined on this FoD with 
m,(A) > 0, m2(B) > 0, and m3(AU B) > 0. The product 
m,(A)m2(B)m3(A U B) > 0 is called a conflicting product 
hereafter because AN. BN (AU B) = @. Based on PCRS (and 
PCR6) rule, we will redistribute the value of this product back 
to the focal* elements A, B and AU B, and proportionally to 
m,(A), m2(B) and m3(A U B). In the improved PCR rules 
developed in this paper we will redistribute this conflicting 
product only to the focal elements A and B since the focal 
element AU B is neither in conflict with A, nor with B. Such 
an improvement in the proportional conflict redistribution is 
made possible by defining a binary keeping-index for each 
focal element involved in the conflicting product. This index 
will allow the identification of elements of the conflicting 
product that will have an effective role in the proportional 
redistribution of conflicting product. All elements (if any) 
having a binary keeping-index equal to zero are discarded of 
the conflict redistribution process. This main idea is developed 
in this paper and illustrated with several examples. It allows to 
preserve the neutrality of the total ignorant source of evidence 
in the improved versions of PCR5 and PCR6 rules, which 
is often considered as a desirable property for a rule of 
combination of distinct and reliable sources of evidence. 

For the reader not immersed in the belief mathematics 
notion, the comparative numerical examples of Example | of 
section HI-B as compared with Example | revisited of section 
VII, provide a quick verification of the improvements. 

This paper is organized as follows. We give the basics of 
belief functions in Section II. We present the PCR5 and PCR6 
rules of combination in Section III with new general formulas 
in subsection II-C, and associated examples in Section IV. 
The flawed behavior of PCR5 and PCR6 rules are highlighted 
in Section V through specific examples. Then, Section VI 
proposes the mathematical expression of the new improved 
PCR5* and PCR6* rules of combination, as well as the 
very detailed procedure to select the focal elements for these 
new proportional redistributions. Finally, comparative results 
for relevant examples are shown in Section VII in order to 
compare the PCRS and PCR6 results with the PCR5* and 
PCR6 results. Concluding remarks are given in section VIII. 
For convenience, two Matlab™ routines are also given in the 
appendix 3 of this paper for PCRS* and PCR6* rules of 
combination. 


II. BASICS OF BELIEF FUNCTIONS 


We consider a given finite set O of n > 1 distinct elements 
© = {61,62,...,4,} corresponding to the frame of discern- 
ment (FoD) of the fusion problem, or the decision-making 
problem, under concern. All elements of © are mutually 


2A focal element is an element (i.e. a subset) having a strictly positive mass 
of belief committed to it - see section II elements 
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exclusive? and each element is an elementary choice of the 
potential decision to take. The power set of © is the set of all 
subsets of © (including empty set @ and ©) and it is usually 
denoted 2° because its cardinality equals 2!°!. We adopt 
Shafer’s formalism whereby propositions are represented by 
subsets [8] (Chap.2, pp. 35-37). Hence, the propositions under 
concern are in one-to-one correspondance with subsets of O. 
We also use classical notations of set theory [35], ie. @ for 
the empty set, 4 U B for the union* of sets A and B (which 
is the set of all objects that are a member of the set A, or 
the set B, or both), AM B for their intersection (which is 
the set of all objects that are members of both A and B), 
etc. A Basic Belief Assignment (BBA) given by a source of 
evidence is defined by Shafer [8] in his Mathematical Theory 
of Evidence (known also as Dempster-Shafer Theory, or DST) 
as m(-) : 2° — [0,1] satisfying 


fae =0, 
eee m(A) =1, 


where ™m(A) is the mass of belief exactly committed to A, 
what we usually call the mass of A. A BBA is said proper 
(or normal) if it satisfies Shafer’s definition (1). The subset 
A C © is called a Focal Element (FE) of the BBA m(-) if 
and only if m(A) > 0. The empty set is not a focal element 
of a BBA because m(@) = 0 according to definition (1). The 
set of all focal elements of a BBA m/(-) is denoted F(m). Its 
mathematical definition is F(m) = {X € 2°|m(X) > O}. 
The cardinality |F(m)| of the set F(m) is denoted F,,. The 
order of focal elements of F(m) does not matter and all the 
focal elements are different. The set F(m) of focal elements 
of m/(-) has at least one focal element, and at most 2!°! — 1 
focal elements. 

Belief and plausibility functions are respectively defined 
from m(-) by [8] 


(1) 


Bel(A)=  S) m(X), (2) 
X€2°|XCA 
and 
PI(A) = m(X) = 1-Bel(A). (3) 


X€2°|ANX AD 
where A represents the complement of A in @. 


Bel(A) and PI(A) are usually interpreted respectively as 
lower and upper bounds of an unknown (subjective) probabil- 
ity measure P(A) [11], [12]. The functions m/(.), Bel(.) and 
Pl(.) are one-to-one. A belief function Bel(.) is Bayesian if 
all Bel’s focal elements are singletons [8] (Theorem 2.8 p. 
45). In this case, m(X) = Bel(X) for any (singleton) focal 
element X, and m(.) is called a Bayesian BBA. Corresponding 
Bel(-) function is equal to Pl(-) and these functions can be 


3This standard assumption is called Shafer’s model of FoD in DSmT 
(Dezert-Smarandache Theory) framework [34]. 

4We prefer the notation AU B for denoting the union of sets A and B, 
which is a formal mathematical notation for the union of two sets, instead of 
the notations AB or {A, B} used by some authors. 
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interpreted as a same (possibly subjective) probability measure 
P(-). The vacuous BBA (VBBA for short) representing a 
totally ignorant source is defined as m,(@) = 1. 


III. COMBINATION OF BBAS 


This Section presents at first the conjunctive rule of com- 
bination which is one of the main rules to combine reliable 
sources of evidence and which allows to identify the con- 
flicting information among the sources. Then we present the 
proportional conflict redistribution rules no. 5 (PCRS) [31] 
and no. 6 (PCR6) [32], [33] as alternatives of Dempster’s rule 
of combination [8]. The development of these rules has been 
motivated by the counter-intuitive behavior of Dempster’s rule 
[8] when combining high conflicting sources of evidences, but 
also when combining low conflicting sources of evidences as 
well>. The reader interested in this topic can refer to [13], 
[28]-[30] to see theoretical justifications and examples. In the 
following, and for simplicity, we restrain our presentation to 
the classical framework of belief functions, and we work with 
BBAs defined only on the power set 2° of a FoD ©. PCR 
rules have been defined originally for working with Dedekind’s 
lattice as well, see Chapter | of [34] (Volume 2). In this paper, 
we present simple general expressions of PCR5 and PCR6 
fusion rules because they are more easy to understand than 
the original general formulas, and they afford expressions of 
the improved PCR5* and PCR6™ rules in a direct and useful 
manner. 

After a brief presentation of the main notations used in 
this paper, we will recall both PCRS and PCR6 rules for 
historical and technical reasons. PCR5 has been developed at 
first, and then PCR6 has been proposed based on a modified 
redistribution principle inspired by PCRS. In this paper, we 
follow the logical and historical development of these PCR5 
and PCR6 rules to make the presentation of their improved 
versions PCR5* and PCR6*. It seems easier to understand 
PCR6* fusion formula once the PCR5* formula will have 
been established. By presenting both rules, we offer to the 
readers a global deeper view on how these new rules work and 
their fundamental and mathematical differences in their con- 
flict redistribution principles. In the sequel, all the introducedg 
examples assume the model of Shafer’s frame of discernment 
as in the classical DST framework. 


A. Notations 


When we make the combination of S > 2 BBAs by the 
conjunctive rule, or by the PCR5 and PCR6 fusion rules, 
we have to compute the product of the masses of the focal 
elements composing any possible S-tuple of focal elements. 


Each possible S-tuple is noted by® 
Kp Oya x 


hi) jore+ +s Sjg) € F(™m1) X F (me) x ..xF(ms), 
where ji € 11523 Vand 4 ik bs Jj2 € 1152; a ™ npemns bs 


js € {1,2,...,Fm,}. The element X,, is the focal element 


5Which is known as the dictatorial behavior of Dempster’s rule [30]. 
6The symbol * means “equals by definition”. 
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of m;(-) that makes the i-th component of the j-th S-tuple 


X;. 

For notation convenience also, the cartesian product 
F(m,) x F(mg) x...x F (ms) is denoted by F(m,...,msg) 
in the sequel. 

We have F * |F(mi,...,ms)| = T[%,|Flm)| = 


if eee Fm, products of masses of focal elements to consider 
and to calculate because we have F,,, focal elements in 
F(m1), Fm focal elements in F(mz2), ..., and Fim, focal 
elements in F(mg). Each product for 7 = 1 to F is of the 
form 
s 
my Xa AGE ANG) =] [ma Xy). (4) 
i=1 
There are two types of products: 
e 15 (X5, N X jy Flees 
(mass) product if 


M X;,) is called a non-conflicting 


XE Aha thee So eG, 


In this case, 7;(X5,0.X5,9... 
m;(X) for short. 

e 15 (X5, TN Xj A Vrss 
product if 


MXj,) is also noted by 


Xj.) is called a conflicting (mass) 


Peo. Come rene eee 


In this case, 7;(X5,0.X5,9... 
(0) for short. 

It is worth noting that an element X € 2° \ {0} may belong 
to sets of focal elements of the different BBAs to combine, 
and therefore a S-tuple Xj can have duplicate components. 
Because all the BBAs are normalized, we always have 


Xj.) is also noted by 


F 


So 1 (Xj, N Xj, (Mla 
j=l 


A X;,) =1. (5) 
As a simple example to illustrate our notations, let’s con- 
sider two BBAs mj(-) and mg(-) defined over the FoD 0 = 
{A, B,C} with respectively two and three focal elements, 
say F(m1) = {A, BUC} and F(m2) = {B,C, AUC}. 
Here Fm, = |F(m1)| = 2 and Fm, = |F(m2)| = 3. For 
ji = 1 (the first focal element of m(-)) one has X;, = A, 
and for 7; = 2 (the second focal element of ™m,(-)) one has 
X;, = BUC. Similarly, for j2 = 1 (the first focal element of 
mMo(-)) one has X;, = B, for j2 = 2 (the 2nd focal element 
of me(-)) one has Xj, = C, and jg = 3 (the 3rd focal 
element of mo(-)) one has X;, = AUC. In this case we 
have F = Fm, - Fm. = 6 products of masses to consider in 
the conjunctive fusion rule (see next sub-section) which are 


™m(AN B) = m1(A)ma(B), 
(AMC) = mi (A)ma(C), 
73(AN (AUC)) = m1(A)m2(AUC), 
ra((B UC) 1B) = m3(BUC)ma(B), 
15((B UC) NC) = mi(BUC)ma(0), 
te((BUC)N (AUC)) = m,(BUC)m2(AUC) 
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The products 7, and 72 are called conflicting products because 
e for 7, the focal elements A and B involved in 7, are 
incompatible (i.e. disjoint) because ANB = 0. 71(ANB) 
is of course equivalent to 7;(X;, 1 X;,) with 7 = 1 by 
taking X;, = A and X,, = B; 
for 7, one has ANC = §. m2(A NC) is equivalent 
to 7;(X;, A Xj.) with 7 = 2 by taking X,;, = A and 
Xj. = C, ete. 
The products 73, ..., and 7m are not conflicting products 
because the focal elements involved in each product have 
non-empty intersection. Because m (A) +m (BUC) = 1 
and m2(B) + mo(C) + m2(A U C) 1, one has 
(m4(A) +m1(BUC))(m2(B) + m2(C) + m2(AUC)) = 1, 
and therefore 75 m7; = 1. This illustrates the formula (5). 


In this paper, 7 € {1,...,.S} represents the index of the 7-th 
source of evidence characterized by the BBA m,(-), and 7 € 
{1,...,#} represents the index of the j-th product 7; (X;, M 
De icea vit pak 


B. The conjunctive rule of combination 


Let’s consider S > 2 distinct reliable sources of evidence 
characterized by their BBA m,(-) (s = 1,...,S) defined on 
2°. Their conjunctive fusion’ is defined for all A € 2° by 


Conj 
1,2,..,9(A) = x, 5) (Xj, Xj. 1... Xj5) 
X;EF(mi,..., ms) 
X51 XG = 
& 
= Yo J] mitx;.) (6) 
X;EF(mi,..., mg)t=1 
X51 N..0X, = 


The symbol @ is also used in the literature, for instance in [36], 
2 (A) = 


to note the conjunctive fusion operator, i.e. mye 8 
[m1Om2® sae @ms](A). 


The total conflicting mass between the S' sources of evidence, 
denoted moo g(@), is nothing but the sum of all existing 
conflicting mass products, that is 


a 


X;EF(m,..., 
X51...NX56=0 


= 15 (X5, XN... X55) 


paisa’ 


=1- DP omin..s(A)- 7) 
AE2°\ {B} 
Note that the combined BBA Tinie g(.) given in (6) is not 


a proper BBA because it does not satisfy Shafer’s definition 
(1). In general the S sources of evidence to combine do not 
fully agree, and we have consequently moon 5(0) > 0. 


) 
Dempster’s rule of combination (called also orthogonal sum 
by Shafer [8], p. 6) coincides with the normalized version 


ps s(A) = 


of the conjunctive rule. It is defined by my} ¢ 


yey 


7The conjunctive fusion rule is also called Smets’ rule of combination by 
some authors because it has been widely used by Philippe Smets in his works 
related to belief functions. But Smets himself call it conjunctive rule, see his 
last paper [20], p. 388. 
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my. g(A)/(— g(0)), assuming mo" (0) A 1. 


The DS upper notation refers to initials of Dempster and 
Shafer names because Dempster’s rule has gained its pop- 
ularity through Shafer’s works on belief functions. Shafer 
uses the symbol & to note Dempster’s fusion operator, i.e. 


beet 


mPss(A) = [m1 ®m2 @... 6 ms](A) for A 4 QO, and 
mPS, (0) = 0. A probabilistic analysis of Dempster’s rule 


of combination can be found in [37], and the geometry of 
Dempster’s rule is analyzed in [38]. 


Example 1: Consider © = { A, B} and two following BBAs 


my (A) = 0.1, mi (B) = 0.2, mi(A U B) = 0.7, 
m2(A) = 0.4, m2(B) = 0.3, m2(A U B) = 0.3. 
We have m3" (0) = 0.11, and 
my > (A) = 0.35; m5 (B) = 0.33; am p(6)= 0.21. 


Symbolically we denote the conjunctive fusion of S sources 
as M5" = Conj(m1,mz2,...,mg). This conjunctive rule 
is commutative and associative. This means that the sources 
can be combined altogether in one step, or sequentially in any 
order and it does not matter. Also, the total ignorant source 
represented by the vacuous (non-informative) BBA has no 


impact in the fusion result - see Lemma 1| below. 


Lemma 1: The vacuous BBA m, has a neutral impact in the 
conjunctive rule of combination, that is 


Conj(m1, m2,...,7™mg, My) = Conj(m1,m2,...,mg). (8) 


Proof: see appendix 1. 


The main drawback of this fusion rule is that it does 


not generate a proper BBA because me (0) > 0 in 
2s) 


general, and also it can provide a fusion result my Na g 
that quickly tends to one after only few steps of a sequential 
fusion processing of the sources which is not very useful 
for decision-making support. This is because the empty set 
(—) is the absorbing element for the conjunctive operation since 
0A =O for all A € 2° so that the mass committed to the 
empty set always increases through the repeated conjunctive 
fusion rule. The main interest of this rule is its ability to 
identify the partial conflicts and to provide a measure of the 
total level of conflict m;‘5’;(@) between the sources which 
can be used to manage (select or discard) the sources in the 
fusion process if one prefers, see [39] for an application in 
geophysics for instance. 


C. PCRS and PCR6 rules of combination 


The Proportional Conflict Redistribution Rules (PCR) have 
been developed originally in the framework of DSmT (Dezert- 
Smarandache Theory) [31], [32], [34] but they can work also 
in the classical framework of Shafer’s belief functions as well. 
Six rules have been proposed and they are referred as PCRI, 
..., PCR6 rules of combination having different complexities, 
PCRI1 being the most simplest (but less effective) one. All 
these rules share the same general principle which consists of 
three steps: 
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apply the conjunctive rule (6); 

calculate the conflicting mass products 7; (0); 
redistribute the conflicting mass products 7, (0) propor- 
tionally on all non-empty sets involved in the conflict. 


The way the conflicting mass product 7; (@) is redistributed 
yields to different versions of PCR combination rules that work 
for any degree of conflict. The sophistication/complexity and 
preciseness of PCR rules increases from the first PCR1 rule up 
to the last rule PCR6. The main disadvantage of these rules, 
aside their complexity, is their non-associativity properties 
which impose to combine all the BBAs altogether with PCR 
rules rather than sequentially to expect the best fusion result. 

In this paper, we focus on the presentation of PCRS and 
PCR6 only because they are the most well-known advanced 
fusion rules used so far in the belief functions community. A 
detailed presentation of other rules of combination encoun- 
tered in the literature can be found in [40]. Symbolically, 
the PCR5 fusion and the PCR6 fusion of S > 2 BBAs 
are respectively denoted mfGS_. = PCR5(m1,m2,...,ms), 


1,2,..., 
PCR6 ef, s 
1525.55 ~~ PCR6(m1, ™2,+-- ,ms). 


and m 

Readers familiar with PCR rules could quickly read the 
example | given in section III-B, and the results obtained with 
classical and improved PCR5 and PCR6 rules in section VII 
to appreciate the discussion throughout the paper. 


The PCRS rule of combination [31]: This rule transfers 
the conflicting mass 7;(@) to all the elements involved in 
this conflict and proportionally to their individual masses, so 
that a more sophisticate and specific distribution is done with 
the PCR5 fusion process with respect to other existing rules 
(including Dempster’s rule). The PCR5 rule is presented in 
details (with justification and examples) in [34], Vol. 2 and 
Vol. 3. 

e The PCRS fusion of two BBAs is obtained by m{'S* (0) = 0, 
and for all A € 2° \ {0} by 


PCRS 


mya (A) = my'y(A)+ 
a j-mea(A)*ina(X) a ma(A)*ma(X) (9) 
ogo (A) + m2(X) —ma(A) + m1 (X)” 
XnA=0 
where my! (A) is the conjunctive rule formula (6) with 


S' = 2, and where all denominators in (9) are different from 
zero. If a denominator is zero, that fraction is discarded. All 
propositions/sets are in a canonical form. We take the disjunc- 
tive normal form, which is a disjunction of conjunctions, and 
it is unique in Boolean algebra and simplest. For example, 
X =ANBN(AUBUOC) it is not in a canonical form, but 
we simplify the formula and X = AN B is in a canonical 
form. 


The PCRS formula (9) for two BBAs can also be expressed 
by considering only the focal elements of m1(-) and mo(-) as 
follows 
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miS(A) = mEB(A) 
mi(X;, )me(X; 
n a) cE ret 
(X51 Xjq)€F (mi) x F(ma) it 2X32 
91 OX jg =0 
Xj,=A 
mi(X;j, )me(X; 
+ m2(X; ) ee, 
(Xj, Xj)€F (m1) x F(m2) cee 42 
X5,NX 5, =0 
Xjg=A 
(10) 
or equivalently, with shorthand 7; notations, as 
mis (A) = mB) 
+ Dae [mie aay1x,,<a(Xi) 
j€{1,...,F}|Xj €F (m1 ma) 
X54, Xj,=0 
AEX; 
7%, 9%) 1 ay 
mi(X5,) + m2(Xj5) 


where F = |F(m1)|-|F(mz2)| is the total number of products 
15 (Xj, NM X5,) = my(X; )ma(Xj5), and A € X; means that 
at least one component of X; equals A. 

e The explicit formula of the PCR5 fusion of three BBAs is 
given in [41]. 

e A simple formulation of the general expression of 
the PCR5 fusion of S >2 basic belief assignments 
is obtained by redistributing each conflicting product 
(0) = 1j(Xj, N...0 Xj, = 0) = TI, mi(X;,) to some 
elements of the power set of the FoD that are involved in the 
conflict. Each 7; (Q) is redistributed proportionally to elements 
involved in this conflict based on the PCRS redistribution 
principle. When an element A € 2° is not involved in a 
conflicting product 7; (0), ic. A ¢ Xj, the conflicting product 
(0) is not redistributed to A. If an element A is involved in 
the conflict X;,9...9Xj, =0, ie. A € X; and 7; (0) occur, 
then the proportional redistribution of 7;(@) to A is given by 


a(A) = ( mi(X;,)) 
iC {1,...,S}|Xj,=A 
mi(X;,)) 
XEX; i€{1,...,S}1Xj,=X 


where A € X; means that at least one component of the S- 
tuple Xj; = (Xj,,...,Xj5) € F(m1,...,mg) equals A. 


Finally the mass value of A obtained by the PCRS rule is 
calculated by 


JE{1,..., F}/AEK Az; (0) 


Conj 
= m5. _s(A) ne 2;(A), 
(13) 
where A € Xj A 7;(Q) is a shorthand notation meaning that 
at least one component of the S-tuple X; equals A and the 


components of X; are conflicting, i.e. Xj,M...N Xj, = 0. 
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Therefore the general PCRS formula can be expressed as 
(0) =0, and for A € 2° \ {0} by 


™1 2078 
mie (A= myo) <A) 
+ LS ( mi(X3,)) 
JE{1,...,F }| AEX; Az; (0) ie{1 sess S}|X5,=A 
(0) |. (14) 


XEX; i€{1,...,5}]Xj,=X 


It is worth noting that the formula (14) is a generalization 
of the formula (11), i.e. (14) coincides with (11) when S = 2. 

This general PCR5 formula is equivalent to the original 
PCRS5 formula given in [31] but it involves only the focal 
elements of the BBAs to combine which makes the derivation 
more efficient (less computationally demanding) than the orig- 
inal general PCRS formula, specially when each BBA has only 
few focal elements. We use this new general PCR5 formula 
because it is relatively simple and easy to improve it into 
PCR5* formula - see section VI-B. The extension of PCR5 
for combining qualitative’ BBAs can be found in [34], Vol. 2 
& 3, and in [33]. PCRS rule is not associative and the best 
fusion result is obtained by combining the sources altogether at 
the same time when possible. A suboptimal fast fusion method 
using PCR5-based canonical decomposition [42] can be found 
in [43]. 


The PCR6 rule of combination [32]: A variant of PCR5 
rule, called PCR6 rule, has been proposed by Martin and 
Osswald in [32], [33] for combining S > 2 sources. Because 
PCR6 coincides with PCR5 when one combines two sources, 
we do not provide the PCR6 formula for two sources which is 
the same as (9). The difference between PCR5 and PCR6 lies 
in the way the proportional conflict redistribution is done as 
soon as three (or more) sources are involved in the fusion as it 
will be shown in the example 2 introduced in the next section. 
The explicit formula of the PCR6 fusion of three BBAs is 
given in [41] for convenience. 


The PCR6 fusion of S > 2 BBAs is obtained by 


mF? 5(0) = 0, and for all A € 2° \ {@} by? 
meg s(A) = my3" _s(A) 
+ YS mee) 
G€{IyeFHAEK AGO) 4€ (1). S}1XG, =A 


75(0) ) 
)) 
w€{1,..., 
The difference between the general PCR5 and PCR6 


formulas is that the PCRS proportional redistribution in- 
volves the products m,;(X,;,) of multiple same 


(15) 


MER, 


peeey 


8A qualitative BBA is a BBA whose values are labels (e.g. low, medium, 
high, etc) instead of real numbers. 
°We wrote this PCR6 general formula in the style of PCR5 formula (14). 
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focal elements A (if any) in the conflict, whereas the 

PCR6 conflict redistribution principle works with their sum 
mj(Xj,) instead. The next section presents 

1 S}|Xy, =A 

some examples for PCR5 and PCR6 rules of combinations. 

We use this general PCR6 formula instead of the original 
Martin-Osswald’s PCR6 formula [32] because it is more 
easy to improve it into PCR6* formula - see section VI-B. 
From the implementation point of view, PCR6 is simpler 
to implement than PCRS. From the Decision-Making (DM) 
standpoint, PCR6 is better than PCR5 when S' > 2 as reported 
by Martin and Osswald in [32] (see also the Example 3 
in the next section) in their applications. For convenience, 
some Matlab™ codes of PCRS and PCR6 fusion rules can 
be found in the appendix of [44], also in Chap. 7 of [34] 
(Vol. 3), or from Arnaud Martin’s web page [45]. PCR6 code 
(in R programming language) can be found also in iBelief 
package developed by Kuang Zhou and Arnaud Martin from 
the BFAS!° repository [46], or directly from [47] as well. 
When we have only two BBAs to combine, PCR5 and PCR6 
rules provide the same result because formulas (14) and (15) 
coincide for S = 2. 

In this paper, we have voluntarily chosen to present the 
two rules PCR5, PCR6 and their improved versions mainly 
for historical reasons and because these two rules have strong 
theoretical links as we have shown. By doing this, we offer the 
possibility to readers (and potential users) to test each of these 
advanced fusion methods and evaluate their performances 
on their own applications. Even though PCR6 is posterior 
to PCRS, since some researchers have implemented and are 
using PCR5 fusion rule, it appears important to introduce the 
improved version of this rule. Furthermore, PCR5 goes back 
exactly on the tracks of the conjunctive rule, while PCR6 does 
not. 


IV. EXAMPLES FOR PCR5 AND PCR6 FUSION RULES 


Here we provide two simple examples showing the dif- 
ference of the results between PCR5 and PCR6 rules. For 
convenience, all numerical values given in the examples of this 
paper have been rounded to six decimal places when necessary. 


Example 2: We consider the simplest FoD © = { A, B}, and 
the three following BBAs 


my,(A) = 0.6, m1(B) —_ 0.1,m1(A U B) = 0.3, 
m2(A) = 0.5,m2(B) = 0.3, m2(A U B) = 0.2, 


Because Fm, = |F(mm1)| = 3, Fm. = |F (m2) 
Fiz = |F(m3)| = 3, we have F = Fm, - Fm, + Fms 
products to consider. Fifteen products are non-conflicting and 


will enter in the calculation of m{%'3(A), m{%',(B) and 


my >'s (AU B), and twelve products are conflicting products 


| = 3 and 
= 27 


'0Belief Functions and Applications Society. 
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that will need to be proportionally redistributed. The conjunc- 
tive combination of these three BBAs is 


mi"5!3(A) = m1(A)ma2(A)ms(A) 
+m1(A)m2(A)m3(A U B) 
+ mi (A)m2(A U B)m3(A) 
+m1(AU B)m2(A)m3(A) 
+ m1(A)m2(A U B)m3(A U B) 
+ mi(A U B)m2(A)m3(A U B) 
+m1(AU B)m2(AU B)ms3(A) 
= 0.5370, 
m&*,(B) = my (B)m2(B)ms(B) 
+m1(B)m2(B)m3(A U B) 
+ mi(B)m2(A U B)m3(B) 
+ mi(A U B)m2(B)m3(B) 
+ mi(B)m2(A U B)m3(A U B) 
+ mi(A U B)m2(B)m3(A U B) 
+ mi(A U B)m2(A UB m3(B) 
= 0.0900, 


my'3'3(AU B) = mi(AU B)m2(AU B)ms(AU B) 
= 0.3: 0.2-0.5 = 0.0300, 
and 
my 3'3(0) = 1 — my'3(A) — mia!,(B) — mya',(A U B) 


= 0.3430. 


In this example we have twelve partial conflicts, noted 7; (0) 


(j =1,...,12), which are given by the following products 
71(0) = m1(A)m2(A)ms3(B) = 0.0300, 
m2(0) = mi(A)m2(B)m3(A) = 0.0720, 
73(0) = mi (B)m2(A)m3(A) = 0.0200, 
m4(0) = mi(B)me2(B)m3(A) = 0.0120, 
m5(0) = mi(B)m2(A)m3a(B) = 0.0050, 
16(0) = mi(A)m2(B)ms3(B) = 0.0180, 
7(0) = mi(AU B)m2(A)ma(B) = 0.0150, 
m3(0) = mi(AU B)m2(B)m3(A) = 0.0360, 
19(0) = mi(B)m2(A)m3(A U B) = 0.0250, 

™10(0) = mi(A)m2(B)m3(A U B) = 0.0900, 
711(0) = m1(A)m2(A U B)ms(B) = 0.0120, 
7™12(0) = mi(B)m2(A U B)m3(A) = 0.0080 


In applying the PCRS formula (14), and the PCR6 formula 


(15) we obtain finally m{G3(0) = miGS(0) = 0, and!! 


mro3(A) © 0.723281, 
193(B) © 0.182460, 


™4 2,3 
PCR? (A U B) & 0.094259, 


™1 23 


'IThe symbol & means “approximately equal to”. 
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and 


PCR6 
™1 23 


(A) © 0.743496, 


mi o5(B) © 0.162245, 
mio3(A U B) = 0.094259. 
We see a difference between the BBAs m{G3 and miQ’s 


which is normal because the PCR principles are quite different. 
Using the PCRS5 fusion rule the first partial conflicting mass 
m™1(0) = mi(A)m2(A)m3(B) = 0.03 will be redistributed 
back to A and B proportionally to m1(A)m2(A) and to 
ms3(B) as follows 


vi(A) mi (B) _ ™(0) 
m1(A)m2(A) = ms3(B) ~~ m4(A)me2(A) + m3(B)’ 
whence 
_ __ma(A)m2(A)m (0) 
(A) tty oona(By 7 0.0228 
_ms(B)m (0 7 
(B) = = Cayma(Ay 4 mal By = 2.00% 


We can verify 71(@) = 71(A) + 21(B) = 0.03. 


Using the PCR6 fusion rule the first partial conflicting 
mass 71(@) = 0.03 will be redistributed back to A and B 
proportionally to (m (A) + m2(A)) and to m3(B). So we 
will get the following redistributions 71(A) = 0.0275 for A 
and 21(B) = 0.0025 for B because 


«1 (A) _ 2 (B) _ (0) 
m(A) +m2(A) —m3(B) — m1(A) + mo(A) + ms(B) 
whence 
21(A) — malA) + ma(A))m (0) 
1(A) = Tata pC ans(B) = 0.0275 
= ms(B)m (0) = 
i1(B) = slAy tmnt) tat BY 7 9-005 


We can verify 71(@) = 11(A) + 21(B) = 0.03. 


Note that for all the partial conflicts having no duplicate 
element involved in the conflicting product 7;(0) we make 
the same redistribution with PCR5 rule and with PCR6 rule. 


For instance, for 77(@) = mi(AU B)m2(A)m3(B) = 0.0150 
we get 
x7(AU B) _ x7(A) = x7(B) 
mi(AUB)  m2(A) ms3(B) 
m7(0) 


~ mi(AU B) +m2(A) + ma(B)’ 


whence 77() = x7(AUB) + 27(A)+27(B) = 0.0150 with 


ma(AU B)r7(0) = 

t7(AUB) = 9019 NUE?) SH EAGT ee 0.0050, 
m2 A)n7(0) a 

x7(A) CU Ba Awe) = 0.0083, 
= m3 B)r7(0) ne 

t7(B) hE) cn ey 0.0017. 


The next example shows also the difference between PCR5 
and PCR6 rules, and it justifies why PCR6 rule is usually 
preferred to PCRS rule in applications. 
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Example 3: we consider the FoD 0 = {A,B,C}, and the 
four very simple BBAs defined by 


m, (AUB) = 1,m2(B) = 1,m3(AUB) = 1, and ma(C) =, 


These BBAs are in conflict because the intersection of their 
focal elements is (AUB)N AN(AUB)NC=9. In this 
example, one has only one product of masses to calculate, 
which is m((A UB) MN AN (AUB) NC) m,(A U 
B)m2(A)m3(A U B)m4(C) 1. In fact this product is 
a conflicting product denoted 7\(@). We can also denote 
it (0) because the index j 1 is useless in this case. 
Moreover, these BBAs are also in total conflict because 
m(0) = m1 (AU B)m2(A)m3(A U B)ma(C) = 1. 


If one applies the PCRS rule principle we get 


x(AU B) 2 x(B) ez x(C) 
mi(AUB)m3(AUB) m2(B)  ma(C) 
(0) 
mi(AU B)m3(AU B) + m2(B) + ma(C) 


whence 2(A U B) = 1/3, 2(B) = 1/3 and x(C) 
that 


mi93,4(A UB) = 2(AU B) =1/3 
mio3,4(B) = «(B) = 1/3 
mio3,4(C) = 2(C) = 1/3 


This PCR5 result appears counter-intuitive because three 
sources among the four sources exclude definitely the hypoth- 
esis C’ because one has Pl,(C) = Plo(C) = Pl3(C) = 0, so 
it is intuitively expected that after the combination of all the 
four BBAs the mass committed to C’ should not be greater 
than 1/4 = 0.25. 


If one applies the PCR6 rule principle we get 


x(AU B) = x(B) x(C) 
mi(AU B)+m3(AU B) m2(B) ma(C) 
(0) 


~ mai(AU B) + ma(AU B) + ma(B) + ma(C)’ 


whence 1(A U B) = 2/4, «(B) = 1/4 and x(C) = 1/4 so 
that 


migs4(A U B) = (AU B) = 0.5, 
fo sal B) =a(B) S025, 
mERS (C) = 2(C) = 0.25, 


which is in better agreement with what we intuitively expect 
because miG¥s ,(C) is not greater than than 1/4. Of course 
in this example, Dempster’s rule of combination cannot be 
simply applied because the conflict is total yielding a division 
by zero in Dempster’s rule formula [8], but by using eventually 
some discounting methods to modify the BBAs to combine. 
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V. FLAWED BEHAVIOR OF PCR5 AND PCR6 RULES 


Formula (17) shows that in general PCR6 is not associative, 
and by combining two sources in a row each time and we 
continue doing that the results is different from the global 
combination of all sources using PCR6. The formula is true. 
Formula (18) says that in general PCR5 is different from 
PCR6, of course except the case when we combine only 2 
sources. Formula (19) shows that in general PCR5 does not 
have the ignorance source as a neutral element. 

The PCR5 and PCR6 rules of combination are not associa- 
tive which means that the fusion of the BBAs must be done 
using general formulas (14) or (15) if one has more than two 
BBAs to combine, which is not very convenient. Therefore, 
the sequential PCRS or PCR6 combination of S > 2 BBAs are 
not in general equal to the global PCR5 or PCR6 fusion of the 
S BBAs altogether because the order of the combination of the 
sources does matter in the sequential combination. In general 
(i.e. when conflicts exist between the sources of evidence to 
combine) one has for S > 2 


PCR5(m1,m2,...,mg) # 
PCR5(PCR5(PCR5(m1,m2),m3),-..,ms), (16) 
and 
PCR6(m1, M2,...,ms) # 
PCR6(PCR6(PCR6(m1,m2),m3),...,mg), (17) 


and also for S > 2 PCRS fusion result is generally different 
of PCR6 fusion result that is 


PCR5(m1, m2,...,mg) # PCR6(m 1, mz2,...,mg). (18) 

PCR5 and PCR6 rules can become computationally in- 
tractable for combining a large number of sources and for 
working with large FoD. This is a well-known limitation of 
these rules, but this is the price to pay to get better results 
than with classical rules. 

Aside the complexity of these rules, it is worth to mention 
that the neutral impact property of the vacuous BBA ™., is lost 
in general when considering the PCR5 or PCR6 combination 


of S > 2 BBAs altogether, that is 


PCR5(m1,...,™g-1, My) # PCRS5(m1,...,mg_1), (19) 
and 
PCR6(m1,...,™g-1, My) # PCR6(m1,...,mg-1). (20) 


This is due to the redistribution principles used in PCR5 and 
in PCR6 rules. Example 4 shows the non-neutral impact of 
the vacuous BBA in PCR5 and PCR6 rules for convenience. 
Note that the vacuous BBA has a neutral impact in the fusion 
result if and only if one has only two BBAs to combine 
with PCRS, or PCR6, and one of them is the vacuous BBA 
because in this case there is no possible (partial) conflict to 
redistribute between any BBA m/(-) defined over the FoD © 
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and the vacuous BBA m,(-). That is, for any BBA m (-) one 
always has 


PCR5(m1, My) = PCR6(m1, my) = m1. (21) 


Example 4: we consider the FoD 0 = {A, B} having only 

two elements, and the following four BBAs as follows: 
m,(A) = 0.6,m,(B) = 0.1,m1(A U B) = 0.3, 

M2(A) = 0.5, m2(B) = 0.3, m2(A U B) = 0.2, 

(A) = 0.4 ee ) = 0.1,m3(A U B) = 0.5, 

m4(AU B) = 


m3 


BBAs m1, mz and ms are as in example 2, and the BBA m4 
is nothing but the vacuous BBA m.,, defined over this FoD 0. 


In example 2, we did obtain with PCR5(m1,m2,ms3) and 
with PCR5(m,,m2,m3,mz4) the following resulting BBAs 


PCRS(A) = 0.723281, 


M123 


miS3(B) ~ 0.182460, 

miB3(A U B) © 0.094259, 
and 

mR (A) © 0.654604, 

mPCRS .(B) ~ 0.144825, 

mPCRS (A U B) = 0.200571. 


Clearly, PCR5 (m1, m2, m3) x PCR5 (m1, ™m2,7™73, ma) even 
if m4 is the vacuous BBA. 


Analogously, we did obtain with PCR6(m1, m2, mg3) and 
with PCR6(m,,m2,m3, m4) 


(A) © 0.743496, 
(B) = 0.162245, 


PCR6 
M1 2.3 

PCR6 
™1 2.3 


mig$(A U B) = 0.094259, 
and 

mi9s4(A) © 0.647113, 

mios,4(B) © 0.128342, 

mios4(A U B) %& 0.224545. 


Therefore, PCR6(m,,m2,m3) 4 PCR6(m1, m2,™m3, m4), 
even if mm, is the vacuous BBA. 


This example 4 shows clearly that the vacuous BBA does 
not have a neutral impact in the PCR5 and PCR6 rules of 
combination. In fact, adding more vacuous BBAs m, in the 
PCRS5 or PCR6 fusion will increase more and more the mass of 
AUB while decreasing more and more the masses of A and of 
B with PCR5, and PCR6. When the number of vacuous BBAs 


mM, increases, we will have!” aero (AUB) > 1, 
PCR5/6 PCRSIE ae 
1,2,3,my,..., im, (A) + 0, and m4'9°3"n,, has m,(B) 0, 


This is unsatisfactory because the vacuous BBA brings no 
useful information to exploit, and it is naturally expected that it 


!2The notation mPCR6 indicates “mPCR> or mPCRO” for convenience. 
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must not impact the fusion result in the combination of BBAs. 
This can be seen as a flaw of the behavior of PCR5 and PCR6 
rules of combination. 


To emphasize this flaw, we give in the example 5 a case 
where the mass committed to some partial uncertainties can 
increase more than necessary with PCRS and with PCR6 rules 
of combination. This is detrimental for the quality of the fusion 
result and for decision-making because the result is more 
uncertain than it should be, and consequently the decision is 
more difficult to make. 


Example 5: we consider the FoD 0 = {A, B,C, D, FE}, and 
the following three BBAs 


mi(AU B) = 0.70, 
mi(C U D) = 0.06, 
m(AUBUCUD) =0.15, 
mi(E) = 0.09, 

and 
m2(A U B) = 0.06, 
m2(C U D) = 0.50, 
m2(AUBUCUD) = 0.04, 
m2 E) _ 0.40, 

and 

m3(B) = 0.01 


Note that the BBA mz3 is not equal to the vacuous BBA but 
it is very close to the vacuous BBA because m3(O) is close 
to one. 


m3(AUBUCU DUE) =0.99. 


If we make the PCR6(7m1, m2) fusion of only the two BBAs 
m , and mz altogether, which is also equal to PCR5(m 4, m2), 
we obtain 


PCR6 
my ,9 


oe U B) = 0.465309, 
(640 D) © 0.296299, 
AUBUCUD) & 0.023471, 


E) = 0.214921. 


If we make the PCR6(m,, m2, m3) fusion of all these three 
BBAs altogether we obtain 


mi S$ (B) ~ 0.000962, 


mei AUB) & 0.286107, 
mi SS (CUD) & 0.203454, 
mks AU BUCUD) & 0.012203, 


mi SS (E) = 0.116038, 


mes AUBUCU DUE) & 0.381236. 


3( 
( 
3( 
( 
3( 
( 

One sees that combining the BBAs m1, m2 with the BBA 
ms3 (where m3 is close to vacuous BBA, and therefore mg is 
almost non-informative) generates a big increase of the belief 
of the uncertainty in the resulting BBA. This behaviour is 
clearly counter-intuitive because if the source is almost vac- 
uous, only a small degradation of the uncertainty is expected 
and in the limit case when ™m3 is the vacuous BBA no impact 
of m3 on the fusion result should occur. Note that this behavior 
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also occurs with PCR5(m 1, m2, m3) because one has for this 


example 
PCRS 


™1,2,3 
mPCRS 


B) = 0.001103, 

B) © 0.286107, 

D) = 0.203384, 
UBUCUD) & 0.012203, 

PCRS E) = 0.115967, 


mS (AUBUCU DUE) ® 0.381236. 


aoe 
™4,2,3 


PCRS 
™}4,2,3 


( 
3(AU 
(CU 
3(A 
( 
3 ( 


The deep analysis of the partial conflict redistributions done 
in this interesting example reveals clearly the flaw of the 
principles of PCRS and PCR6 rules of combination. Indeed, 
for this example one has Fm, + Fm -Fmz = 4:4: 2 = 32 
products Nj (Xj, N X jo N X jg) = m1(Xj,)mo(X5,)ms3 (Xj5) 
to calculate, where Xj, € F(m1) ={AUB,CUD,AUBU 
CUD, E}, Xj, € F(m2) = {AUB, CUD, AUBUCUD, E}, 
and Xj, € F(m3) ={B, AUBUCUDUE}. Among these 
32 possible conjunctions of focal elements, twenty products 
corresponds to partial conflicts when Xj, Xj, Xj, = 9, 
which need to be redistributed properly to some elements of 
2© \ {0} according to the PCRS, or the PCR6 redistribution 
principles. 

More precisely, we have to consider all the following 
products 7; for calculating the result 


m™1(B) = mi(AU B)m2(A U B)m3(B) = 0.00042, 
m2(AU B) = mi(AU B)m2(AU B)ms(O) = 0.04158, 
13(0) = m1(AU B)ma(C U D)msa(B) = 0.0035, 
ma(0) = m1(AU B)ma(C U D)ms(@) = 0.3465, 
a5(B) = mi(AU B)m2(AU BUC U D)ma(B) = 0.00028, 
m6(AU B) = m1(AU B)m2(AU BUC U D)ms3(©) = 0.02772, 
17(0) = mi(A U B)m2(E)m3(B) = 0.0028, 
m™3(0) = mi(A U B)me2(E)m3(0) = 0.2772, 
m9(0) = mi(C U D)m2(A U B)m3(B) = 0.000036, 
m10(0) = mi(C U D)m2(AU B)m3(@) = 0.003564, 
m11(0) = mi(C U D)m2(C U D)m3(B) = 0.0003, 
m2(C UD) = mi(C'U D)m2(C U D)m3(®) = 0.0297, 
713(0) = mi(C' U D)m2(AU BUC U D)ms3(B) = 0.000024, 
ma(C UD) =mi(CU D)m2(AU BUCU D)m3(0) 
= 0.002376, 
7™15(0) = mi(C U D)m2(F)m3(B) = 0.00024, 
716(0) = mi(C U D)m2(E)ms3(O) = 0.02376, 
m717(B) = mi(AU BUCU D)m2(A U B)ma(B) = 0.00009, 
mis(AU B) = mi(AU BUCUD)m2(AU B)ms3(@) = 0.00891, 
mi9(0) = m1(AU BUC U D)ma(C U D)ms3(B) = 0.00075, 
t20(C UD) = mi(AU BUCU D)ma2(C U D)m3(0) 
= 0.07425, 
m21(B) =mi(AUBUCU D)m2(AU BUCU D)ma(B) 
= 0.00006, 


tT22.(AUBUCUD)=mi(AUBUCUD)m2(AUBUCUD) 
-m3(®) = 0.00594, 

7723(0) — mi(A UBUCU D)m2(F)m3(B) = 0.0006, 

T24(0) =m, (A UBUCU D)m2 (E)m3(@) = 0.0594, 


725 (0) =mM1 (E)m2(A U B)m3(B) — 0.000054, 
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126(0) = mi(E)mo(AU B)ms(®) = 0.005346, 

r27(0) = mi(E)ma(C U D)ms(B) = 0.00045, 

m28(0) = mi(E)ma(C U D)ms3(@) = 0.04455, 

129(0) = mi(E)m2(AU BUC U D)ms(B) = 0.000036, 
130(0) = mi(E)m2(AU BUCU D)ma(®) = 0.003564, 
731(0) = mi(E)m2(E)ms(B) = 0.00036, 

1732(E) = mi(£)m2(E£)m3(©) = 0.03564 


The conjunctive rule gives 


Conj 


my‘9'3(B) = m1 (B) + m5(B) + m17(B) + 7721(B) = 0.00085, 
my33(A UB) = 12(AUB)+76(AU B)+718(AU B) 
= 0.07821, 
my s'3(C U D) = m2(CU D) + ma(C U D) + m20(C U D) 
= 0.106326, 
mi 9'3(A U BUC UD) = 7722(AU BUC UD) = 0.00594, 
msi, (E) = 132(E) = 0.03564. 


The total conflicting mass between these three BBAs is 


(0) 


Conj 
My) 2.3 


_ Conj Conj 
= 1— my 9°3(B) — ™y) 2,3 


— m{%,(AU BUCUD) -— 


Wa ULD) 


(E) = 0.773034. 


(AUB) — 


Conj 
™y) 2,3 
PCRS 


Let’s examine how the mj93(0) * 0.381236 value is 
obtained based on the PCRS redistribution principle. Based 
on the structures of 7; (0) products, we have to consider only 
products involving a proportional redistribution to 0. So we 
get a proportional redistribution to O only from the following 
products 


ma(0) = mi(AU B)ma(C U D)m3(0) = 0.3465, 

m3 (0) =mi(AU B)m2(E )m3(9) = = 0.2772, 

710(0) = mi(C U D)m2(A U B)m3(@) = 0.003564, 
m16(0) = mi(C U D)m2(E)m3() = 0.02376, 

m24(0) = mi(AU BUCU D)mo2(E)m3(@) = 0.0594, 
7726(0) = mi(F)m2(A U B)m3(Q) = 0.005346, 

m28(0) = mi(E)ma(C U D)m3(@) = 0.04455, 

730(0) = mi(F)m2(AU BUCU D)m3(@) = 0.003564. 


Because there is no duplicate focal elements in each of these 
products, the PCR5 and PCR6 redistributions to © will be the 
same in this example. 


The proportional redistribution of 74(Q) to © is 


ms3(O)m4(0) 


a(O) = mi(AU B) + ma(C UD) + ma() 


~ 0.156637. 


The proportional redistribution of 7g(@) to © is 


ms3(O)7s(0) 


tO) mai(AU B) + ma(E) + m3(®) 


= 0.131305. 
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The proportional redistribution of 719(@) to © is 


m3(©)710(0) 


——~._————__———————. ® 0.003179. 
mi(C U D) + m2(A U B) + m3(0) 


x10(9) = 


The proportional redistribution of 71¢(@) to © is 


ms(©)m16(9) 


= ——— ——  —_ & 0.016222. 
mi(C U D) + m2(E) + m3(0) 


16 (QO) 


The proportional redistribution of 724(0) to © is 


ms(O)m24(0) 


TAU BUCUD) him BVO) © 9:088186. 
mi(AUBUCUD) + m2(£) + m3a(0) 0.038186 


x24(0) = 


The proportional redistribution of 72g(@) to © is 


ms(©)726(9) 


——_—— SS © 0.004643. 
mi(E£) +m2(AU B) + m3(0) 


x26(O) — 


The proportional redistribution of 72g(@) to © is 


ms(O)728 (0) 


—— ————§£2——. © 0.027914. 
mi(E) + m2(C U D) + m3(0) 


28 (QO) = 


The proportional redistribution of 739(0) to © is 


m3(O)730(0) 


aE) MC ALB ene ee 
mi(B) +m2(AU BUCUD) +m3(0) 0.003150 


x30(@) = 


Therefore we finally obtain the quite big value for the mass 
committed to O 
(O) = x4(O) + xg(O) + X10(O) + x16(9) + x24(O) 
+ x26(©) + r28(O) + %30(O) 
& 0.381236. 


PCRS 
™1 23 


We see clearly why PCR5 (and PCR6) redistributes some 
mass to uncertainty O although the focal element O is not in 
conflict with other focal elements involved in each product 
74(0), 730), m0(0), 716), m24(0), 726(0), ™28(0) and 
739(0), which is an undesirable behavior that we want to avoid. 
That is why we propose in the next section some improvement 
of PCRS and PCR6 rules of combination. 


VI. IMPROVEMENT OF PCR5 AND PCR6 RULES 


To circumvent the weakness of the orignal PCR5 and PCR6 
redistribution principles, we propose an improvement of these 
rules that will be denoted as PCR5* and PCR6* in the sequel. 
These new rules are not redundant with PCRS nor with PCR6 
when combining more than two BBAs altogether.. 


The very simple and basic idea to improve PCR5 and 
PCR6 redistribution principles is to discard the elements that 
contain all the other elements implied in the partial conflict 
1; (0) calculation. Indeed, the elements discarded are regarded 
as non-informative and not useful for making the conflict 
redistribution. 


For instance, if we consider the previous Example 5, the 
conflicting mass with PCRS* and PCR6* for the conflicting 
product 74(0) = mi(A U B)me(C U D)ms3(0) will be 
proportionally redistributed back only to AU B and to CUD 
but not to O because AU B C O and CUD C O. Thus 
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with PCR5* and PCR6* rules we will make the following 

redistribution: 
va(A U B) = 
Mi (A U B) 


za(C UD) - 
m2(C UD) 


m4(0) 
mi(AU B) + me2(C UD) 


Here, 4(@) is set to 0 with PCRS* and PCR6* principles 
because no proportion of 74(@) must be redistributed to O. 


However, with PCR5 and PCR6 rule we make the redistri- 
butions according to 


xva(A U B) 
mi(A U B) 


ra(C U D) 
ma(CU D) 


xa(O) 
= a(0) 
mi(AU B) +m2(CU D) + m3(0) 


A. Selection of focal elements for proportional redistribution 


The main issue to improve PCRS and PCR6 rules of 
combination is how to identify in each conflicting product 
m7; (0) the set of elements to keep for making the improved 
proportional redistribution. 


In this section we propose a solution of this problem that 
can be easily implemented. For convenience, we give also the 
basic Matlab™codes of PCR5* and PCR6* in appendix 3. 


Let’s consider 7;(0) = mi(Xj,)m2(X;,)...mg(Xj,) a 
conflicting product'? where Xj, Xj;,N...A Xj, = 0. We 
denote by V; = {X1,...,Xs,,8; < S} the set of all distinct 
components of the S-tuple X,; related with the conflicting 
product 7; (@). The order of the elements in V; does not matter. 
The number s; of elements in 4; can be less than S because 
it is possible to have duplicate focal elements in 7;(). We 
consider in 4; only the distinct focal elements involved in 
mj (0) (see the next example) and we will define their binary 
keeping-index indicator which will allow to know if each 
element of %; needs to be kept in the proportional conflict 
redistribution, or not, in the improved PCRS and PCR6 rules 
of combination. 


For each element X; € %; we first define its binary 
containing indicator 6;(Xy,X1,) with respect to X) € 4X; to 
characterize if X, contains (includes) X, in wide sense, or 
not. Therefore, we take 6;(Xv, Xi) = 1 if XyNX1 = Xv, or 
equivalently if Xj C X), and 6;(X1, X71) = 0 otherwise. The 
definition of this binary containing indicator is summarized by 
the formula 


Of course 46; (X1,X1) = 1 because X; MX, = Xi, and we 
have 6;(X1, X1) = 0 as soon as |Xy| > |.Xi|, where |Xy| and 
|X7| are the cardinalities of X; and X; respectively. We have 


1 if Xy CX, 
0 if Xy g Xi. 


A 


6;(Xy, Xi) = (22) 


We consider S > 2 BBAs because for S = 2 BBAs no improper 
increasing of uncertainty occurs with PCR5 or PCR6. 
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also 6;(Xy,X1) = 0 when Xp 1X, # Xv. For X; = O, we 


have 6;(Xv, X71) = 6;(Xv, ©) =1 for any Xp € %;. 


To know if a focal element X;, € Xj; must be kept, or 
not, in the proportional redistribution of the j-th conflicting 
mass 7;(0) with PCR5* and PCR6* rules, we have to 
determinate its binary keeping-index «;(X,,). For this, we 
define «;(X,,) € {0,1} as follows 


I 


Xy  X1EX; 
Xp AX 
X5,1<|Xil 
Xy|<|X1| 


b;(Xv, X71). (23) 


The value «;(X,,) = 1 stipulates that the focal element 
X;, © Xj; must receive some proportional redistribution 
from the conflicting mass 7;(0). The value «;(X;,) = 0 
indicates that the focal element X;, will not be involved 


in the proportional redistribution of the conflicting mass 77; (0). 


The binary keeping-index can also be defined equivalently 


as 
1 if c(X;,) is true, 
K(X5,)=41—[] x,ex, 5;(Xiv, Xj,) if c(X;,) is false, 
Xp #X3, 
IX 1S1X5,| 
(24) 
where the condition c(X;,,) is defined as 
c(X5,) = 1X, € X; such |X| > |X;,| and «;(X1) = 1. 


Because this second definition of «,;(X;,) is self- 
referencing, we need to calculate the binary keeping-indexes 
iteratively starting by the element of 1; of highest cardinality 
(say X), then for elements of ¥; of cardinality |X| — 1 (if 
any), then for elements of ¥; of cardinality |X| — 2 (if any), 
etc. From the implementation standpoint the definition (24) is 
more efficient than the direct definition (23). 


Remark 1: We always have x;(O) = 0 if O € 4X; because 
© always includes all other focal elements of 7; and © has 
the highest cardinality, so 6;(Xv,©) = 1 for all Xy € 4). 
Therefore the binary keeping-index formula (23) reduces to 


I 6(%,0) =1-L1....-1=0. 
SS 
Xp EX; || terms 


Remark 2: For a given FoD and a given number of BBAs 
to combine, it is always possible to calculate off-line the 
values of the binary keeping-indexes of focal elements of all 
possible combinations of focal elements involved in conflicting 
products 7; (0) > 0 because the binary keeping-index depends 
only on the structure of the focal elements, and not on the 
numerical mass values of the focal elements. This remark is 
important, especially in applications where we have thousands 
or millions of fusion steps to make because we will not have to 
recalculate in each fusion step the binary keeping-indexes for 
each 7;(Q) even if the input BBAs values to combine change. 
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Remark 3: It is worth to recall that PCR5* and PCR6* have 
interest if and only if we have more than two (S > 2) BBAs 
to combine. If we have only two BBAs to combine (S' = 2) 


we always get Mpcrs = Mpcr5t+ = Mpcr6 = Mpcret+ because 
in this case the PCR5, PCR5*, PCR6, PCR6™ rules coincide. 


For convenience, we illustrate the calculation of these 
binary keeping-indexes based on the direct calculation (23) 
for different examples. 


Example 6: We consider the FoD 0 = {A,B,C,D}, six 
BBAs, and the j-th conflicting (assumed strictly positive) 
product whose structure is as follows 


13 (0) = m1(A)m2(BU C)m3(AU C)ma(BUC) 


-ms(AUBUC)me(AUBUCUD) 


In this product 7; (0) we have the duplicate focal element BU 
C because it appears both in m2(BUC) and in m4(BUC). The 
focal elements entering in each BBA of 7; (() are respectively 
Xj, = A, Xj, = BUC, Xj, = AUC, Xj, = BUC, 
Xj, = = AUBUGC, and Dee = AUBUCUD=0O. So 
we have to consider only the following set of distinct focal 
elements for this 7; (0) product 


xX; 


{X, = A, Xo = BUC,X3 


Ra Awww 


=Aue 
= AUBUCUD} 


Therefore, considering only Xp 4 X; and |Xv| < |X_| that 
are conditions entering in formula (23), we have the following 
binary containing indicator 6;(Xy, X7) values: 


5;(X1,X2) = 0 because (X1 = A) ¢ (X2 = BUC), 

6;(X1, X3) = 1 because (X, = A) C (X3 = AUC), 

6;(X1, X4) = 1 because (X; = A) C (X4 = AUBUC), 
6;(X1, X5) = 1 because (X; = A) C (X5 = 9), 

5; (X2,X3) = 0 because (Xp = BUC) ¢ (X3 = AUC), 
6;(X2, X4) = 1 because (X2g = BUC) C (X4 = AUBUC) 
6;(X2,X5) = 1 because (X2 = BUC) C (X5 = 9), 

5; (X3,X2) = 0 because (X3 = AUC) ¢ (X2 = BUC), 
6;(X3, X4) = 1 because (X3 = AUC) C (X4 = AUBUC) 
6;(X3, X5) = 1 because (X3 = AUC) C (X5 = 9), 
6;(X4,X5) = 1 because (X4 = AU BUC) C (X5 = 8). 


The binary keeping-indexes k,;(X;,) ..,6 are 
calculated based on the formula (23) as follows: 
« For the focal element X;, = A= X, of 4; having 


|X;,| = 1, we get 


I] 


XX EX; 
Xp AX] 
|X5,1<|X1| 
IXpl<1Xil 
= 1 — [6;(X1, X2)d;(X1, X3)bj(X1, X4)d;(X1, Xs) 
- 6j(X2, X3)dj(X2, X4)d;(X2, Xs )dj(X3, X2) 
- 6j(X3, X4)dj(X3, X5)dj(X4, X5)] 
=1-0-1-1-1-0-1-1-0-1-1-1=1. 


&j(A) =1— 6; (Xv, X1) 
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Hence the focal element X;, = A will be kept in the 
proportional redistribution of the conflicting mass 7; (0). 
For the focal element Xj, = BUC = X2 of 4X; having 
|X,,| = 2, we get 


IXjo 151X711 
IXp1<1X1| 


Hence the focal element X;, = BUC will be kept in the 
proportional redistribution of the conflicting mass 7; (0). 


For the focal element X;, = AUC = X3 of Xj having 
|X,,| = 2, we get 


IXj31<1XaI 
|Xpr1S1Xa| 


Hence the focal element X;, = AU C will be kept in the 
proportional redistribution of the conflicting mass 7; (0). 
For the duplicate focal element X;, = BUC of 4; hav- 
ing |X,,| = 2, we have k;(Xj,) = 1 because Xj, = X;, 
and «;(X,,) =1. 

For the focal element Xj, = AUBUC =X, of 4%; 
having |.X;,| = 3, we get 


K(AUBUC)=1- J] 46;(Xv, Xi) 
Xp XL EX; 
Xp #X 
X55 1<|X1| 
Xp |<1X7l 
= 1-—- [6;(X1, X4)d;(X1, Xs) 
- 6j(X2, X4)dj(X2, Xs); (X3, Xa) 
- 6; (Xs, X5)6 (Xa, X5)] 


=1-1-1-1-1-1-1-1-1=0. 


Hence the focal element X;, = AU BUC will 
be discarded in the proportional redistribution of the 
conflicting mass 7; (0). 


For the focal element X;, = AU BUCUD=0=Xz5 
of ¥; having |X,,| = 4, we get 
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kj(O) =1— 6;(Xv, Xi) 


I] 


Xp XL EX; 
Xp FX] 
IXj61S|Xil 
IX IS<|X1| 


Op iy a )Os ake Onl has Kelby a XO) 
=1-1-1-1-1=0. 


This result illustrates the validity of the aforementioned 
remark 1. Hence the focal element Xj, = AUBUCUD = 
© will be discarded in the proportional redistribution of 
the conflicting mass 7; (0). 


In summary, the conflicting product 7; (0) = m1(A)m2(BU 
C)m3(A U C)m4(B U C)ms(A U BU C)me(©) will be 
redistributed only to the three focal elements A, BUC and 
AUC with the improved rules PCR5* and PCR6*, whereas 
it would have been redistributed to all five focal elements A, 
BUC, AUC, AUBUC and O with the classical PCR5 and 
PCR6 rules. Thus, two focal elements were discarded. 


Example 7: This example is somehow an extension of ex- 
ample 6 by including a new element £ in the FoD. So, 
the FoD is 0 = {A, B,C, D, E}, seven BBAs, and the j-th 
conflicting (assumed strictly positive) product whose structure 
is as follows 


14 (0) = mi (AUE)m2(BUCUE)m3(AUCUE)ma(BUCUE) 
-m5(AUBUCU E)m6(AUBUCU DU E)m7(A). 


In this product 7;(9) we have the duplicate focal element 
BUCUE because it appears both in m2(BUC U E) and 
in m4(BUCUE). The focal elements entering in each BBA 
of 7;(@) are respectively X;, = AUE, Xj, = BUCUE, 
Xj, = AUCUE, Xj, = BUCUE, X;, = AUBUCUE, 
Xj, = AUBUCUDUE = © and Xj, = A. So we have to 
consider only the following set of distinct focal elements for 
this 7; (0) product 


=BUCUE,X3=AUCUE, 
AUBUCUDUE, X¢ = A}. 
Therefore, considering only X; #4 X; and |Xy| < |X,| that 


are conditions entering in formula (23), we have the following 
binary containing indicator 6,;(Xy, X7) values: 


X; = {X, = AUE,X, 
X4 = AUBUCUE, Xs = 


6;(X6, X1) = 1 because (Xg = A) C (Xi, = AUB), 

5; (X6, X2) = 0 because (Xg = A) ¢ (Xo = BUCUE), 
6;(X6,X3) = 1 because (Xg = A) C (X3 = AUCUEB), 
6;(X6, X4) = 1 because (X6 = A) C (X4 = AUBUCUBE), 
6;(X6, X5) = 1 because (Xe = A) C (X5 = 9), 

5;(X1, X2) = 0 because (X1 = AU E) ¢ (X2 = BUCUE), 
6;(X1, X3) = 1 because (X, = AU E) C (X3 = AUCUE), 
6;(X1, X4) = 1 because (X; = AUE) C (X4 = AUBUCUE) 
6;(X1, X5) = 1 because (X; = AU E) C (X5 = 9), 

5;(X2,X3) = 0 because (Xp = BUCUE) ¢ (X3 = AUCUE), 
6;(X2, X4) = 1 because (Xp = BUCUE) C (X4 = AUBUCUE) 
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6;(X2, X5) = 1 because =BUCUE 
6;(X3, X2) =0 because (X3 = AUCUE 


) (X2 (X5 = 9), 
) ( 
6;(X3, X4) = 1 because (X3 = AUCUE 
) ( 
) ( 


(Xo = BUCUE), 
(X4 = AUBUCUB), 
(X5 =), 

5) = 1 because (X4 = AUBUCUE) C (X5 = 9). 


c 

g 

< 
6;(X3, X5) = 1 because (X3 = AUCUE) C 
6;(Xa, X. 
The binary keeping-indexes «;(X,j,) for i = 1,2,...,7 are 
calculated based on the formula (23) as follows 

« For the focal element X;, = AU EF = Xj of 4X; having 

|X, | = 2, we get 


Kj (Xj,) =1—- I] di(Xi, Xz) 


Xi, X1EX; 
Xp FAX 
|X5,1<|X1| 
|Xy|S1X7| 
Be ee eet Cae gis 
0; (X2, X3)d;(X2, X4)dj(X2, X5)d;(X3, X2) 
0; (X3, X4)dj(X3, X5)dj(X4, X5)d; (Xo, X1) 
“5)(Xe, X2)85(Xo, Xa)B,( Xe, eek X6, Xs) 


=1-0-1-1-1-0-1-1-0-1-1-0-1-1-1=1. 


Hence the focal element X;, = AU E will be kept in the 
proportional redistribution of the conflicting mass 7; (0). 

e For the focal element Xj, = BUCUE= X92 of 4; 
having |.X;,| = 3, we get 


Ki(Xjn)=1- J] 65(Xv, X) 


X11, X1 EX; 
Kp AX 
IXjq1S|X1I 
[Xy|<|X1| 
=1- eae X2)6j(X1, X3)6;(X1, X4)6;(X1, X5) 
6; (X2, X3)dj(X2, X4)dj(X2, X5)d;(X3, X2) 
0; (X3, X4)d;(X3, X5)dj(X4, X5)d; (Xo, X2) 
05 (Xe, eat a ee 5) 


=1-0-1-1-1-0-1-1-0-1-1-1-0-1-1-1 
= 1, 


Hence the focal element X;, = BUC U EF will also be 
kept in the proportional redistribution of the conflicting 
mass 77; ((). 


e For the focal element X;, = AUCUE = X3 of 4; 
having |.X;,| = 3, we get 


Ky(Xjs)=1— [J 5;(Xv, Xy) 
Xp, X1EX; 
Kp AX 
1X5 1<1X11 
|Xp|</X7| 


=1- Vee (X1, X3)6;(X1, X4)d;(X1, Xs) 
6; (Xo, X. 3 6;(X2, X 4 6; (Xo, X 


e For the duplicate focal element X;, = BUC U E having 


|X5,| = 3, we have k;(X;,) = 1 because X;, = Xj, and 
Kj(Xj,) = 1. 


For the focal element Xj, = AU BUCUE = X4 hav- 
ing |X,,| = 4, we get 


Ky(Xj)=1- J] 65(Xv,X) 
Xp, X1€X; 
Xp AX 
|X551S1X11 
|Xy|<|X7| 
= 1 — [6;(X1, X4)d;(X1, X5)b;(X2, X4)d;(X2, Xs) 
- 0j(X3, X4)dj(X3, X5)dj(X4, X5)dj(X6, X4) 
- 6j(X6, X5)] 
=1-1-1-1-1-1-1-1-1-1=0. 


Hence the focal element X;, = AU BUC U E must be 
ignored in the proportional redistribution. 


For X;,=AUBUCUDUE=0=Xz _ having 
|X,,| = 5, we get 


ry(Xjg)=1—- [J 5j(Xv, Xi) 


= 1 — [6;(X1, X5)b;(X2, X5)b;(X3, X5)dj(X4, X5) 
- 6j(X6, X5)] 
=1-1-1-1-1-1=0. 


This result illustrates the validity of the aforementioned 
remark 1. Hence the focal element X;, = AU BUCU 
DUE must be ignored in the proportional redistribution. 


For the focal element Xj, = A = X¢ having |X,,| = 1, 
we get naturally (see our previous remark 1) 


Ky (Xj7) = 1—- II 63(Xv, Xi) 
Xp, X1€X; 
Xp FX] 
|Xj71S1Xi] 
|Xy|<|X7| 
=1- . (X1, X2)6j(X1, X3)6;(X1, X4)6;(X1, X5) 
0; (X2, X3)dj(X2, X4)d;(X2, X5)dj(X3, X2) 
0; (X3, X4)dj(X3, Xs )dj(X4, X5)d; (Xo, X2) 
Bie Xe XI Ken X 5) 
=1-—-0-1-1-1-0-1-1-0-1-1-1-0-1-1-1 


Hence the focal element X;, = A must be kept in the 
proportional redistribution. 


summary, the conflicting product 7,;(0) = 


5;(X3,X2) Mm(A U E)m(B UC U B)ms(A U C U B) 


aCe Neca, 6; (Xe, Xs) 
-1-1-0-1-1-1-0-1-1-1 


II 

a 
| 

oO 
_ 
_ 
_ 
=) 


Hence the focal element X;, = AUC'U E is also kept 
in the redistribution. 
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be redistributed only to focal elements AU FE, BUCUE, 
AUCUE and A with the improved rules PCRS* and 
PCR6*, whereas it would have been redistributed to all focal 
elements AUE, BUCUE, AUCUE, AUBUCUE, 909 
and A with the classical PCR5 and PCR6 rules. 


) ) 5) 
OCW O Oe Oe Che CLO cy, ma(BUCU E)ms(AU BUC U E)me6(9)m7(A) will 
) ) ) 
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Example 8: This is a somehow simplified version of example 
6. We consider the FoD 0 = {A, B,C, D}, only five BBAs, 
and suppose that the 7-th conflicting (assumed strictly positive) 
product is as follows 


143(0) = mi(A)m2(BUC)m3(AU C)ma(BUC) 


ms(AUBUCUD). 


Based on (23), it can be verified'* that the binary keeping- 
indexes of focal elements involved in conflicting products are 


K3(A) = 1, 

kj(BUC) =1, 

Ki (AUC) =1, 
kj(AUBUCUD) =0. 


Example 9: We consider the FoD 0 = {A, B,C, D}, seven 
BBAs, and suppose that the j-th conflicting (assumed strictly 
positive) product is as follows 


74 (0) = mi(A)m2(BUC)m3(AU C)ma(BUC) 


-m5(AUBUCUD)m6(AUBUC)m7(AU BUC). 


Based on (23), it can be verified that the binary keeping- 
indexes of focal elements involved in conflicting products are 


)=0, 


Example 10: We consider the FoD 0 = {A,B,C}, three 
BBAs, and suppose that the j-th conflicting (assumed strictly 
positive) product is as follows 


13 (0) = mi(A)m2(BUC)m3(A UC). 


Based on (23), it can be verified that the binary keeping- 
indexes of focal elements involved in conflicting products are 


Kj(A) = 1, 
Kj (BU C) = 1, 
kj(AUC) =1. 


Example 11: We consider the FoD 0 = {A,B,C}, four 
BBAs, and suppose that the j-th conflicting (assumed strictly 
positive) product is as follows 

74 (0) = m1(A)me2(B U C)m3(A U C)ma4(A U B). 


'4The verification is left to the reader. 
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Based on (23), it can be verified that the binary keeping- 
indexes of focal elements involved in conflicting products are 


Example 12: We consider the FoD O = {A,B,C}, three 
BBAs, and suppose that the j-th conflicting (assumed strictly 
positive) product is as follows 


13 (0) = 
Based on (23), it can be verified that the binary keeping- 
indexes of focal elements involved in conflicting products are 
kj(AU BUC) = 0, 
kj(A) =1, 
Ky (B U C) Sl 
Example 13: We consider the FoD 0 = {A, B,C, D}, and 
the three following BBAs 
m(AU B) = 0.8, m1(C U D) = 0.2, 
m3(B) = 0.1,m3(AU BUCU D) =0.9. 


mi(AU BUC)m2(A)m3(BUC). 


We have F = |F(m)|- |F(ma)|-|F(ms3)| = 2-2-2 =8 
products 7; (j = 1,...,#) entering in the fusion process as 
follows 

m™1(B) = m,(AU B)m2(AU B)m3(B) = 0.032, 
m™2(AU B) = m,(AU B)m2(A U B)m3(0) = 0.288, 
73(0) = m,(AU B)me2(C U D)m3(B) = 0.048, 
t4(0) = m,(AU B)me(C U D)m3(0) = 0.432, 
m5(0) = m1(CU D)m2(A U B)m3(B) = 0.008, 
m6(0) = m1(C U D)m2(A U B)m3(@) = 0.072, 
t7(0) = m1(C U D)m2(C U D)m3(B) = 0.012, 
m(C UD) =mi(C U D)m2(C U D)m3(9) = 0.108. 


Based on (23), it can be verified!> that the binary keeping- 
indexes of focal elements involved in conflicting products 
73(0) to 77(Q) are 


k3(AU B) = 1,63(C U D) = 1,«3(B) =1, 
k4(AU B) = 1, K4(C U D) = 1, K4(0) = 0, 
k5(C UD) =1,45(AU B) = 1,45(B) =1, 
ke(C UD) = 1,K6(AU B) = 1, k6(9) = 0, 


k7(C UD) = 1,K7(B) = 1. 


In summary, once the binary keeping-index of «;(X,,) of 
all focal elements X;, involved in a conflicting product 7; (0) 
are calculated, we can apply PCR5, or PCR6 redistribution 


principle only with the focal elements for which «;(Xj,) = 1. 


'5The verification is left to the reader. 
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With this new improved method of proportional redistribution 
PCR5* and PCR6? rules will never increase the mass of non 
conflicting elements involved in each 7;(Q) (if any), and in 
doing this way we will preserve the neutrality of the vacuous 
belief assignment in the PCRS* and PCR6* fusion rules, 
which is a very desirable behavior. 


B. Expressions of PCR5* and PCR6* fusion rules 


The expressions of PCR5* and PCR6* fusion rules are 
proper modifications of PCR5 and PCR6 formulas (14) and 
(15) taking into account the selection of focal elements on 
which the proportional redistribution must apply thanks to the 
value of their binary keeping-index. 

The PCR5* fusion of S >2 BBAs is obtained by 
mPCRS” (0) = 0, and for all A € 2° \ {0} by 


rey 


mER>” (A) = mo) (A) 
i S- (x ;(A) m(Xj,)) 
JE{I1,...,F}|AEK,; Az; (0) i€{1,...,S}|Xj,=A 
73 (0) 
. (25) 
~ («;(X) CaN) 


X€EX; 4€{1,...,5}| Xj, =X 


The PCR6* fusion of S >2 BBAs is obtained by 
mPCRS” (0) = 0, and for all A € 2° \ {0} by 


estes 


Ds 


JE{1,...,F}| AEX; Ar; (0) 


Salaia's 


i€{1,...,S}|Xj,=A 
73 (0) (26) 


XL (ei(X) mika) 


XEX; i€{1,...,5}| Xj, =X 


where «;(A) and «;(X) are respectively the binary keeping- 
indexes of elements A and X involved in the conflicting 
product 7; (Q), that are calculated by the formula (23) or (24). 


Remark 4: It is worth mentioning that PCR5* formula (25) 
is totally consistent with PCR5 formula (14) when all binary 
keeping-indexes are equal to one. Similarly, the PCR6* 
formula (26) reduces to PCR6 formula (15) if all binary 
keeping-indexes equal one. 


Theorem: The vacuous BBA m, has a neutral impact in 
PCR5* and PCR6™ rules of combination. 


Proof: see appendix 2. 


C. On the complexity of PCR5* and PCR6* fusion rules 


The complexity of PCR5 and PCR6 rules is difficult to 
establish precisely because the number of computations highly 
depends on the structure of focal elements of the BBAs 
to combine, but definitely it is higher than Dempster’s rule 
of combination. What about the complexity of PCRS* and 
PCR6* fusion rules? On the one hand, PCRS‘ and PCR6* 
seem more complex than PCR5 and PCR6 rules because one 
needs extra computational burden with respect to PCRS and 


PCR6 rules to calculate the binary keeping-indexes. But in 
fact, the calculation of binary keeping-indexes do not depend 
on the mass values of focal elements but only on their struc- 
ture. Hence, the binary keeping-indexes can be calculated off- 
line once for all for many possible structures of focal elements 
of BBAs to combine. On the other hand, if the binary keeping- 
index calculation is done off-line, then PCR5* and PCR6* 
become less complex than PCR5 and PCR6 rule because some 
elements are discarded with PCR5* and PCR6* making the 
redistribution simpler and more effective than with PCR5 and 
PCR6 rules. It is not possible to say for sure if globally 
PCR5* and PCR6* are more (or less) complex than PCRS and 
PCR6 because it really depends on the fusion problem under 
consideration and the structure of focal elements of BBAs 
to combine. If the sources of evidence to combine generate 
many partial conflicts to redistribute including many elements 
to discard, then PCRS* and PCR6* are more advantageous 
than PCR5 and PCR6 in terms of reduction of complexity. 


VII. EXAMPLES FOR PCR5* AND PCR6* FUSION RULES 


Here we compare the results obtained with PCR5* and 
PCR6* with respect to those drawn from PCRS and PCR6 
rules on the examples from | to 13 in the previous sections. 
Since these following examples, for PCR5+ and PCR6* 
fusion rules, respectively consider the same FoD and BBAs as 
those presented, they will be denoted as “revisited examples”. 


Example 1 (revisited): Consider © = {A,B} and two 
following BBAs 


m (A) = 0.1, 
m3(A) = 0.4, 


Because there is only two BBAs to combine, we have 


PCR5(m1, m2) = PCR6(m1, m2) 
PCR5* (m1, m2) = PCR6* (m1, mz). 


We have m{%"(A) =0.35,  m{3"(B) = 0.33, and 
my>'(0) =0.21, and we have the two conflict- 
ing products m1(0) = m,(A)me2(B) = 0.03 and 


2(0) = m2(A)m1(B) = 0.08 to redistribute. 
Applying PCRS5 principle for 7, (@) = 0.03 we get 


vi(A) _ wi (B) _ m7(0) 
m(A) — m2(B) — mi(A) + m2(B)’ 
whence 21(A) = 0.1- Tees = 0.0075 and 21(B) = 0.3. 
Tits = 0.0225. 
Applying PCRS5 principle for 72(@) = 0.08 we get 
t2(A) _ #2(B) _ 772(0) 
m2(A) mi (B) — ma(A) + mi(B)’ 
whence 12(A) = 0.4- res = 0.0533 and x2(B) = 0.2- 
20h, = 0.0267, 
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Therefore we get 
mig? (A) = mA) = myg"(A) + 21(A) + @2(A) 
= 0.35 + 0.0075 + 0.0533 = 0.4108, 
miS?(B) = mEZ(B) = my3"(B) + 21(B) + v2(B) 
= 0.33 + 0.0225 + 0.0267 = 0.3792, 
mER (AU B) = miB°(AU B) = my3!(AU B) = 0.21. 


If we want to apply PCR5*, or PCR6*, rule we need to 
compute the binary keeping-indexes of each focal element 
entering in the conflicting products 71(@) and 72(0). In this 
example for 71 (0) = m1(A)m2(B) we have XY, = {A, B}, 
and for 72(0) = m2(A)mi(B) we have X%2 = {A,B}. 
Applying formula (22), we get 6;(A, B) = 0 because A ¢ B, 
and 6|(B, A) = 0 because B ¢ A (and also 52(A, B) = 0 
and 62(B, A) = 0). Applying formula (23) we get the binary 
keeping-indexes «1(A) = 1, «1(B) = 1, Ko(A) = 1, and 
k2(B) = 1 indicating that the redistribution of 7,(0) must 
operate on all elements of 1; = {A, B}, and the redistribution 
of 72(0) must also operate on all elements of %, = {A, B}, 
so there is no element that must be discarded for making the 
improved redistribution in this example. Therefore PCR5*, or 
PCR6* results coincide with PCR5 and PCRG results, that 
is mPCRS(.) a mPCRE(.) = mPCRS* (.) = mPCRE™ (.) which is 


normal. 


Example 2 (revisited): Consider © = {A, B} and the three 
following BBAs 
m (A) — 0.6, m,(B) = 0.1,m1(A U B) = 0.3, 
M2(A) = 0.5, m2(B) = 0.3, m2(A U B) = 0.2, 
As shown in Section IV, for this example one has the follow- 


ing twelve conflicting products to redistribute when applying 
PCRS5, or PCR6 fusion formulas. 


7™71(0) = m1(A)m2(A)ms3(B) = 0.0300, 
72(0) = mi(A)m2(B)m3(A) = 0.0720, 
773(0) = mi(B)m2(A)ms3(A) = 0.0200, 
ma(0) = mi(B)m2(B)msa(A) = 0.0120, 
m5 (0) = mi(B)m2(A)m3(B) = 0.0050, 
16(0) = mi (A)m2(B)m3(B) = 0.0180, 
17(0) = mi(AU B)ma2(A)m3(B) = 0.0150, 
m™3(0) = m1 (AU B)m2(B)ms3(A) = 0.0360, 
9(0) = mi(B)m2(A)m3(A U B) = 0.0250, 
™10(0) = mi(A)m2(B)m3(A U B) = 0.0900, 
7™11(0) = mi(A)mo2(A U B)m3(B) = 0.0120, 
7™12(0) = mi(B)m2(A U B)m3(A) = 0.0080. 


With PCR5 and PCR6 the products 71() to 7¢(@) are 
redistributed to A and B only, whereas the products 77(Q) 
to 712(@) are redistributed to A, B and AU B. Apply- 
ing PCRS formula (14), and PCR6 formula (15) we obtain 
mio3(0) = mi53(0) = 0 and 
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mR, (A) = 0.723281, 
mR, (B) = 0.182460, 
miRS (AU B) © 0.094259, 


mS (A) ~ 0.743496, 
mRS(B) ~ 0.162245, 
mPRS (AU B) © 0.094259. 


and 


The calculation of the binary keeping-indexes by the for- 
mula (23) gives in this example 


kj(A) = 1,«,(B) = 1, for 7 =1,...,6 

kj(A) = 1,4,;(B) =1,«6;(AU B) =0, for 7 =7,...,12 

Therefore, if we apply the PCR5* and PCR6* improved 
rules of combination, we redistribute the products 71(0) to 
m6(0) to A and B (as for PCR5 and PCR6 rule), but the 
products 77(Q) to 712(@) will be redistributed to A, B only, 
and not to AU B because k;(AU B) = 0 for j =7,...,12. 
So finally, we obtain me: (0) = nage: (0) = 0 and 


mi RS" (A) = 0.768631, 
mB" (B) = 0.201369, 
miBS" (AU B) = 0.03, 


mPSRS" (A) = 0.788847, 
mERS* (B) = 0.181153, 
mPCRS* (AU B) = 0.03. 


and 


We can verify that we obtain a more precise redistribution 
with PCR5* (resp. PCR6*) rule with respect to PCRS (resp. 
PCR6) rule because eee (AU B) < m{Q3(A U B) and 
also m{RS" (AU B) < mUQs(AU B). 


Example 3 (revisited): we consider 9 = {A, B,C}, and the 
four very simple BBAs defined by 


m,(AUB) = 1,m2(B) = 1,m3(AUB) = 1, and ma(C) = 1 


These four basic belief assignments are in total conflict be- 
cause (AU B)N AN (AU B) NC = O, and one has only one 
product (0) = m1(AU B)me2(A)m3(A U B)ma(C) = 1 to 
consider, so 7 = 1 in this case and it can be omitted in the 
notations of the binary keeping-indexes. 


As shown previously, one has 


mi33,4(A U B) = 1/8, 
mi3'3,4(B) = 1/8, 
mio's,4(C) = 1/8, 
Because all focal elements AU B, A and C’ entering in 
m(Q) are conflicting then one has the binary keeping-indexes 
k(AUB) = 1, K(A) = 1 and K(C) = 1 ice. all these elements 
will receive a redistribution of the conflicting mass 7(@). 
Therefore there is no restriction for making the redistribution. 
Consequently, PCR5*t result coincides with PCRS result, 
and PC'R6* result coincides with PCR6 result. 


mias4(AU B) = 0.5, 
mS, 4(B) = 0.25, 


and B) 
miss 4(C) = 0.25. 


Example 4 (revisited): we consider 0 = {A,B}, and the 
following four BBAs 
m,(A) = 0.6,m1(B) = 0.1,m1(A U B) = 0.3, 
m(A) = 0.5, mo(B) = 0.3, m2(AU B) = 0.2, 
m3(A) = 0.4,m3(B) = 0.1,m3(A U B) = 0.5, 
( 


ma(AU B)=1 (mz is the vacuous BBA). 
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The BBAs m1, mz and m3 are the same as in Example 2, and 
the BBA mz, is the vacuous BBA. We have already shown that 
PCR5(m1, m2,m3) 4 PCR5(m1, m2, m3,ma4) even if m4 is 
the vacuous BBA, and 


mS, 4(A) & 0.654604, 


mR, 4(B) © 0.144825, 


mR (AU B) © 0.200571. 


Similarly, PCR6(m1,m2,m3) #4 PCR6(m1,m2,m3,m4), 
and 
mFS, 4(A) & 0.647118, 
mERS 4(B) = 0.128342, 
migs4(A U B) & 0.224545. 


Applying the PCR5* formula (25), and the PCR6* formula 


(26) we will obtain ore (0) = OES (0) = 0 and 
miB5"4(A) © 0.768631, mi23,4(A) % 0.788847, 
miSS, 4(B) © 0.201369, and { miSS",(B) ~ 0.181153, 


PCRS" 


miss 4(A U B) = 0.03, ae 


mERS",(AU B) = 0.03. 

One has PCR5* (m1, m2,™m3, ma) = PCR5* (m1, m2, m3) 
and also PCR6™ (m1, me2,m3,ma) = PCR6* (m1, m2, m3) 
because with the improved proportional redistribution of 
PCR5* and PCR6*+ rules the vacuous BBA has always 
a neutral impact in the fusion result, which is what we 
intuitively expect. 


Example 5 (revisited): we consider 0 = {A, B,C, D, FE}, and 
the following three BBAs 


mi(AU B) = 0.70, 

UD) = 0.06, 
UBUCUD) =0.15, 
) = 0.09, 


Re Q 


AUB) = 0.06, 

CUD) = 0.50, 
AUBUCUD) = 0.04, 
= 0.40, 


and 


m3(B) = 0.01, 
m3(AU BUCU DUE) = 0.99. 


Note that the BBA mz is not equal to the vacuous BBA but 
it is very close to the vacuous BBA because m3(O) is close 
to one. 


If we consider the fusion of only the two first BBAs m1 
and mz, we have PCR6(mi,m2) = PCR6*(m1,m2) = 
PCR5(m1,m2) = PCRS+(m1,mz) because all these rules 
coincide when combining two BBAs. 


mi S°(A U B) & 0.465309, 

mio (CU D) = 0.296299, 
mio*(AUBUCUD) & 0.023471, 
R6(F) ee 0.214921. 


If we make the PCR5, PCR5*, PCR6 and PCR6* fu- 
sion of these three BBAs altogether we obtain now dif- 
ferent results which is normal, because for S > 2 
one has PCR5St(m1,...,ms) # PCRS(mi,...,mg) and 
PCR6*(m1,...,mg) # PCR6(m1,...,mg) in general. So, 
in this example 5 we get results shown in Tables I and II. 


Focal Elements miRS (-) 


a “O0T 107 
0.464483 
0.296186 


B 0.001103 
AUB 


0.286107 
0.203385 
0.012203 0.023408 
0.115966 0.214816 
0.381236 0 


Table I 
EXAMPLE 5: RESULTS OF PCR5+ VERSUS PCRS. 


CUD 
AUBUCUD 
E 


AUBUCUDUE 


Focal Elements mS -) 


B 0.000962 
AUB 0.286107 
0.203454 
0.012203 


PCR6 
mM) 2.3 ‘) 


0.000967 
0.464483 
0.296255 
0.023408 
0.214887 
AUBUCUDUE | 0.381236 0 


Table II 
EXAMPLE 5: RESULTS OF PCR6+ VERSUS PCR6. 


CUD 
AUBUCUD 
E 0.116038 


These values highlight the great ignorance of the results 
proposed by PCR5 and PCR6 when the third (almost 
vacuous) source of information is taken into account. Indeed, 
mES3(O) = miS(O) is the greatest mass among the set 
of hypotheses, whereas the results proposed with PCR5* and 
PCR6* combination rules discard the ignorant information 
and propose results closer to those obtained by merging two 


sources. Indeed, the largest mass is allocated to AU B. 


The next examples 6 to 12 are very simple examples 
involving only categorical BBAs so that only one conflicting 
product (equals to one) needs to be redistributed based on 
PCRS, PCR6, PCR5* and PCR6* rules. These examples 
offer the possibility to the reader to do the derivations 
manually for making a verification of our results. 


Example 6 (revisited): we consider 0 = {A, B,C, D}, and 
the following categorical BBAs m (A) = 1, m2(BUC) = 1, 
m3(A UC) = is ma(BUC) = Ty m;(AU BUC) = 1 and 
me(AU BUC UD) = 1. If we make the PCR5, PCR5*, 
PCR6 and PCR6* fusion of these six BBAs altogether we 
obtain results given in Tables III and IV. 

In this example, we have only one conflicting product 7 (0) 
to redistribute which is given by 


Ty (0) = m,(A)mg2 (B o, C)ms (A U C)m4(B U C) 


ms(AUBUC)me(AUBUCUD). 


Because K;(AU BUC) = 0 and k;(AUBUCUD) = 0, these 
two disjunctions are discarded and more mass is committed 
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Focal Elements 


PCRS (-) 


PCRS 
7 23,4,5,6 () 


™ 23,4,5,6 


A 1/3 
AUC 1/3 


BUC 1/3 
AUBUC 0 
AUBUCUD 0 


Table III 
EXAMPLE 6: RESULTS OF PCR5+ VERSUS PCRS. 


mos 4.5.6) 
1/4 
1/4 


PCR6 
Focal Elements mi5°3 45.6) 
A 


1/2 
0 
0 


Table IV 
EXAMPLE 6: RESULTS OF PCR6+ VERSUS PCR6. 


to A, AUC and BUC with PCR5* and PCR6* rules. There 
is more mass allocated to BUC with PCR6* and PCR6 than 
with PCR5* and PCR5 because two sources of information 
support this hypothesis. 


Example 7 (revisited): we consider 0 = {A,B,C,D, EF}, 
and the following seven categorical BBAs m (A U £) = 1, 
m2(BUCUE) = 1, m3(AUCUE) = i, ma(BUCUE) — 1, 
m(AUBUCUE) =1, m(AUBUCU DUE) = 1, and 
m7(A) = 1. If we make the PCR5, PCR5*, PCR6 and PCR6* 
fusion of these seven BBAs altogether we obtain results given 
in Tables V and VI. 


Focal Elements mS, 4.5,6,70) 


A 

AUE 

AUCUE 
BUCUE 
AUBUCUE 
AUBUCUDUE 


PCR5 
4 2;3,4,5,6 7() 


Table V 
EXAMPLE 7: RESULTS OF PCR5+ VERSUS PCRS. 


PCR6 
11 2,3,4,5,6 7() 


PCR6 
Focal Elements m9°3.4.5,6,7() 
A 


AUE 


AUCUE 
BUCUE 
AUBUCUE 
AUBUCUDUE 


Table VI 
EXAMPLE 7: RESULTS OF PCR6+ VERSUS PCR6. 


In this example 7, we have only one conflicting product 
71(0) to redistribute which is given by 


m™1(0) =m1(AU E)mo(BUCU E)m3(AUCU E) 
-ma(BUCUE)ms(AUBUCUE) 
-m6(AU BUCUDU E)m7(A). 


Because &1(AUBUCUE) = 0 and «,(AUBUCUDUE) = 0, 
these two disjunctions are discarded and more mass is 
committed to A, AU FE, AUCUE and BUCUE with 
PCRS5* and PCR6* rules. There is more mass allocated 
to BUCUE with PCR6* and PCR6 than with PCRS5* 
and PCR5 because two sources of information support this 
hypothesis. 


Example 8 (revisited): we consider 0 = {A, B,C, D}, and 
the following categorical BBAs m (A) = 1, m2(BUC) = 1, 
m3(AUC) = 1, m4(BUC) = 1 and m;(AUBUCUD) = 1. 
If we make the PCRS, PCRS*, PCR6 and PCR6* fusion of 
these seven BBAs altogether we obtain results given in Tables 
VII and VIII. 


A 


1/4 1/3 
1/4 1/3 
1/4 1/3 

0 


Table VII 
EXAMPLE 8: RESULTS OF PCR5+ VERSUS PCRS. 


A 


1/5 1/4 


1/5 1/4 
2/5 1/2 
1/5 0 


Table VIII 
EXAMPLE 8: RESULTS OF PCR6*+ VERSUS PCR6. 


Because k;\(AUBUCUD)=0, this disjunction is 
discarded and more mass is committed to A, AUC and 
BUC with PCR5* and PCR6* rules. There is more mass 
allocated to BU C with PCR6* and PCR6 than with PCR5* 
and PCR5 because two sources of information support this 
hypothesis. 


Example 9 (revisited): we consider 0 = {A,B,C,D}, 
and the following seven categorical BBAs m;(A) = 1, 
m(BUC)=1, m3x(A UC) = 1, m(BUC) = 1, 
ms(AUBUCUD) = 1, mg(AU BUC) = 1, and 
m7(AU BUC) = 1. If we make the PCRS, PCRS*, PCR6 
and PCR6* fusion of these seven BBAs altogether we obtain 
results given in Tables IX and X. 


Focal Elements mS 45,6,70) | MSS 45,670) 
A 1/3 
AUC 1/3 


BUC 
AUBUC 
AUBUCUD 


Table IX 
EXAMPLE 9: RESULTS OF PCR5+ VERSUS PCRS. 


Because k1(AU BUCUD) = 0 and «\(AU BUC) = 0, 
these disjunctions are discarded and more mass is committed 
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PCR6 PCR6 
Focal Elements ™10345.6 7(-) ™M193.45 6.7) 


A 1/7 1/4 
AUC 1/4 


BUC 1/2 
AUBUC 0 
AUBUCUD 0 


Table X 
EXAMPLE 9: RESULTS OF PCR6+ VERSUS PCR6. 


to A, AUC and BUC with PCR5* and PCR6* rules. There 
is more mass allocated to BUC' with PCR6* and PCR6 than 
with PCR5* and PCRS because two sources of information 
support this hypothesis. Similarly, more mass is allocated to 
(AU BUC) with PCR6 than PCRS since two sources of 
information support this hypothesis. 


Example 10 (revisited): we consider 0 = {A, B,C}, and the 
following three categorical BBAs m (A) = 1, m2(BUC) = 1, 
and m3(A UC) = 1. We have only one conflicting product 
7™71(0) = mi(A)m2(B U C)m3(A U C) = I to redistribute, 
and for this example we have «;(A) = 1, Ki(AUC) = 1 
and «1(B UC) = 1 which means that all focal elements A, 
AUC and BUC must be kept, and they must receive a mass 
through the proportional redistribution principle. Hence in this 
example we have m{G3 = miQ3 = miQy" = ae and 
the combined masses are evenly distributed as shown in the 
Table XI. 
PCRSF (.) PCRS (.) 


Focal Elements mS ( +) | mre M493 


A 1/3 1/3 1/3 


AUC 1/3 1/3 1/3 
BUC 1/3 1/3 1/3 


Table XI 
EXAMPLE 10: RESULTS OF PCRS, PCR5*, PCR6, PCR6T. 


Example 11 (revisited): we consider 0 = {A,B,C}, 
and the following four categorical BBAs m(A) = 1, 
m2(B U C) = iL m3(A U C) = 1, and ma(A U B) lx 
Because we have only’ one conflicting product 
™1(0) = m,(A)m2(B U C)m3(A U C)ma(A U B) = 1 
and k1(A) = A, Kki(A U B) = 1, k1i(A U C) = A] 
and «1(B UC) = 1 no hypothesis is discarded in 
the proportional conflict redistribution, and we _ get 
Mots = Moc4 = mes, = Mees with the merged 
masses being evenly distributed, that is m{G3 4(A) = 1/4, 
mESS (A UB) = 1/4, mi 4(A UC) = 1/4, and 
mes? (BUC) = 1/4, . 

Example 12 (revisited): we consider 0 = {A,B,C}, and 
the following three categorical BBAs m|(AU BUC) = 1, 
m2(A) = 1, m3(B UC) = 1. If we make the PCR5 
fusion, and the PCR5* fusion, of these three BBAs 
altogether we obtain results given in Table XII. Because 
m™(0) = mi(AU BU C)me2(A)m3(B U C), we get 
k1(AUBUC) =0, (A) = 1 and &1(BUC) = 1 based on 
(23). Therefore, using the PCR5* combination rule, we get a 


BUC 
AUBUC 


Table XII 
EXAMPLE 12: RESULTS OF PCRS, PCR5+. 


redistribution of the conflicting mass 71 (0) = 1 only between 


A and BUC. In this example we have m{G3 = mG, and 


+ + ; 
mECRS = mERS", because no mass is allocated on the same 


hypothesis by two different sources. 


Example 13 (revisited): we consider 0 = {A, B,C, D}, and 
the three following BBAs 


mi(A U B) = 0.8, mi(C U D) = 0.2, 
m2(A U B) = 0.4, m2(C U D) = 0.6, 
m3(B) = 0.1,m3(AU BUCUD) =0.9. 


If we make the PCRS, PCRS*, PCR6 and PCR6* fusion of 
these seven BBAs altogether we obtain results given in Tables 
XIII and XIV. 


Focal Elements 


maga) 
0.041797 


mis () 
0.041797 
0.487632 0.613029 
0.258327 0.345174 
0.212244 0 


Table XIII 
EXAMPLE 13: RESULTS OF PCR5* VERSUS PCRS. 


Focal Elements miESs (-) 


0.037676 


mass) 
0.037676 
0.613029 
0.349295 


0.487632 
0.262448 
0.212244 0 


Table XIV 
EXAMPLE 13: RESULTS OF PCR6* VERSUS PCR6. 


Because «;(Q) = 0 for any conflicting product 7;(Q) 
involving ©, this hypothesis is discarded in the redistribution 
of 74(0) and of 76(0) (see Example 13 in subsection VI-A 
for details), and therefore more mass is redistributed to AU B 
and C'U D with PCR5* and PCR6* rules. No more mass 
is committed to B with PCR5* and PCR6™ respectively in 
comparison with PCR5 and PCR6. This is because B is not 
implied in any partial conflict with © (cf. subsection VI-A for 
details). 


VIII. CONCLUSION 


In this paper, after having demonstrated the flawed behavior 
of PCRS and PCR6 rules of combination for S > 2 BBAs (in- 
cluding possibly vacuous BBAs), we proposed improvements 
to correct these behaviors. A computation of a binary keeping- 
index has been detailed which makes it possible to discard 
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ignorant information sources for the calculation of each partial 
conflict. This binary keeping-index has been integrated into 
the original formulations of PCR5 and PCR6 in order to 
ensure the neutrality property of the vacuous BBA and to 
propose two new combination rules for a number of sources 
greater than 2: PCR5* and PCR6™ rules. The interest of such 
combination rules could prove to be particularly important 
in an application case identifying many ignorant sources of 
information. In such a scenario, the preponderant ignorance 
of a certain number of sources will no longer obscure a more 
precise characterization provided by other sources. These new 
rules of combination have been already applied to risk analysis 
issues for geophysical and geotechnical data fusion in order 
to reinforce the levee protection characterizations [48]. 


APPENDIX 1: PROOF OF THE LEMMA 1 


We prove that: eae s,s+1(A) = Myo" s(A), for any A € 
2° \ {0}, where ms41(O) = 1 is the vacuous BBA my. 
The set of focal elements of mg41i(-) is F(mg41) = {O}, 
therefore F,,,,, = 1 and X;,,, = ©. Based on the formula 


(6) written for S + 1 BBAs, we have 


Soni (Xj: i eee 


Xj €F(my,..,mg,mg41) 
x; {0 9X5 99X55, =A 
S41 


= s cae) 


Xj;EF(m,....mg, Best) i=1 
X51 0..9X7jgNO= 


Conj 
™y 9, 


Con 
hie. s,s41(A) a NXg OXjgs1) 


(27) 


Because X;,,, = © is constant and mg41(Xj,,,) = 


mg+i(Q) = 1, one has 


S41 S 
[[ m(%,) = (Tmax is) -ms41(O) = [ [ mi(X;,) 
i=1 i=1 
and », Comin eae XG, NO Xjgg. = X51... NXIZ NO = 
Xj,9...0Xj,. Therefore the formula (27) becomes 


S+1 


X;E€F(m4,....ms,mgs41) =1 
X54. XjgNO=A 


here 
mMy5'5.941(A) = 


M 


XjE€F(my,...,mg) t=1 
Xi... AXjg=A 


— ca 
— yo 


soaae 


which completes the proof of the Lemma 1. 


APPENDIX 2: PROOF OF THE THEOREM 
We {prove that PCR5*(mj,...,ms,mg41) = 
PCR5*(mi,...,mg), or equivalently that mf*" 5.,(A) = 
ne (A) for any A € 22% \ {@}, where 
msi1(Xje.,) = ms+1(0) = 1 is the vacuous BBA. 
It is worth noting that m°° 3.....8,941(A) = mye) 5(A) for 
any A € 2° \ {} because the vacuous BBA mg+1(.) is the 


neutral element of the conjunctive rule (see Lemma 1). It is 
important to note that when considering A = 0, we have 
always rie 54(0)= ie s,s41(9) = mooi g(9) = 

mi cst (9) because the binary keeping-index of O is always 
Sul to zero (see remark 1), i.e. K;(Q) = 0. Therefore all the 
redistribution terms to © in PCR5* (and in PCR6*) formula 
are equal to zero when A = O. So, we just have to consider 
A # © to make the proof. 

Because mg41(-) is the vacuous BBA, its set of focal 
elements is F(mg4i) = {©} and it contains only one focal 
element, i.e. |F(mg41)| = 1. Therefore 


-|F(ms)| - |F(ms41)| 
-|F(ms)| 


(28) 
(29) 


This means that the number of conflicting products 7;(0) 


associated to the S + 1-tuple Kj; = (X;,,...,X;5,0) € 
F(m1,...,™mg,™mg41) is equal to the number of conflicting 
products 7;(@) associated to S-tuple Xj = (X;,,...,Xj5) € 
F(mj,...,™mgs). Moreover, we always have 

S+1 


S S 
II mi(Xj,) = I] mi(Xj,))-ms+41(O) = []mi(%.) 


Hence, we always have 
13 (X5,M SMe NO = 0) =a AA Tess 


because Xj,1...9 Xj, NO = Xj,N...N XG¢. 
Based on the formula (25) written for S'+1 BBAs, we have 


+ ‘on 
mis s.g41(A) = cee s,s41(A) 
+ Ss [(«s(A) mi(Xj.)) 
jE (1,...,F}|/AEK AT; (0) i€{1,...,S4+1}|Xj,=A 
fp Xo Xie VO: =) (30) 
(K(X) mi(Xj,)) 


X€EX; i€{1,...,S4+1}|Xj,=X 


where F is given by (28). 
Because X;,,, = © and because we consider A # O, we 
have always 


i€{1,...,5+1}|Xj,=A 


I 


i€{1,...,S}|Xj,=A 


Whether X € X; = (Xj,,...,Xj,) or X € Kj = 
(X5,,---,Xj 5,9) the value of «;(X) is the same since the 
additional binary containing indicator 6;(X,Q) entering in 
the product of the computation of the binary keeping-index 
is always equal to 1 and does not modify «;(X) value, and 
of course when X = A. Because the binary keeping-index 
entering in the numerator and denominator of formula (30) 
removes the factor ms+1(@) from all products it belongs to 
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(since © includes all elements of the product it belongs to), 
the formula (30) reduces to the following formula 
Conj 


1,2....,8(A) 
(es (A) 


+ 
mio” _s 941(A) =m 


+ es I 


jE {1,...,F}|AEX, Ar; (0) i€{1,...,S}|Xj,=A 
Ti AG 1 XG =O) 
dX (K(X) IT mi(Xj,)) 
XEX; i€{1,...,S}|Xj,=X 
+ 
= miZes(A) BD 
where X, represents now the S-tuple (X;,,...,Xj,), and 
1; (0) = 1; (Xj, flsecal" Xjg = 0). 

So, we have proved PCR5*(m,...,mg,mgs41) = 


PCR5*(m1,...,mg) when mg41 is the vacuous BBA. Sim- 
ilarly, we can prove that PCR6‘(m4,...,ms,mg+1) 
PCR6* (m1,...,mg5) when mg+, is the vacuous BBA. This 
completes the proof of the theorem. 


APPENDIX 3: CODES OF PCR5* AND PCR6* RULES 


For convenience, we provide two basic Matlab™codes for 
PCR5t and PCR6* for the fusion of S > 2 BBAs for 
working with 2°, i.e. working with Shafer’s model. No input 
verification of input is done in the routines. It is assumed that 
the input matrix BBA is correct, both in dimension and in 
content. The derivation of all possible combinations is done 
with combvec (Combinations, vec) instruction which 
is included in the Matlab™ neural networks toolbox. This 
combvec call can be a very time-consuming task when the 
size of the problem increases. A standalone version of these 
codes is also available upon request to the authors. The j-th 
column of the BBA input matrix corresponds to the (vertical) 
BBA vector m,(.) associated with the j-th source s;. Each 
element of a BBA matrix is in [0,1] and the sum of each 
column must be one. If N is the cardinality of the frame O 
and if S' is the number of sources, then the size of the BBA 
input matrix is ((2%) — 1)) x S$. Each column of the BBA 
matrix must use the classical binary encoding of elements. 
For example, if © = {A, B, C}, then we encode the elements 
of 2° \ {0} by the binary sequence 001 = A, 010 = B, 
011 =AUB,...,111= AUBUC. The mass of empty set 
is not included in the BBA vector because its is always set to 
zero. 

These codes can be used and shared for free for research 
purposes only. Commercial uses of these codes, or adaptation 
of them in any programming language, is not allowed without 
written agreement of the authors. These codes are provided 
by the copyright holders “as is” and any express or implied 
warranties are disclaimed. The copyright holder will not be 
liable for any direct, or indirect damages of the use of these 
codes. The authors would appreciate any feedback in the use 
of these codes, and publication using these codes should cite 
this paper in agreement for their use. 
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Theo Dezert & Jean Dezert 
mS]= Matrix of BBAs to combine with PCR5+ 
-/mS) fusion result 


% Authors and copyrights: 


% Input: 
% Output: 


BBA=[ml m2 ... 
mPCRSplus is PCR5+(m1,m2,.. 


NbrSources=size (BBA, 2) ; CardTheta=log2 (size (BBA, 1) +1); 
(NbrSources==1), mPCR5plus=BBA(:,1);return, end 
mPCR5plus=zeros (size (BBA, 1),1);FocalElem = cell (NbrSources,1); 
for i=1:NbrSources, FocalElem{i}=find(BBA(:,i)> 0)’;end 
Combinat ions=combvec (FocalElem{1:NbrSources})"; 
for c=1:size (Combinations, 1) 
PC=Combinations (c, :) ;masseConj=diag(BBA(PC,:))'j 
massCon j=prod (diag (BBA (PC, :))’,2) ;Intersections=PC (1); 
for s=2:NbrSources, Intersections=bitand (Intersections, PC(s)) 
if (Intersections ~=0) 
mPCR5p lus (Intersections) =mPCR5plus (Intersections) +massConj; 
else 
Binary=[];CardPC=[];KeepIndex=[]; 
for i=1:NbrSources 
Binary (i, :)=bitget (PC(i),CardTheta:-1:1,’int8’); 
CardPC (i, :)=sum(Binary (i,:)==1); 


if 


; end 


end 
for j=1:NbrSources 
delta=[]; 
for js=1:NbrSources 
if CardPC (js) >=CardPC (4) 
for jp=1:NbrSources 
if PC(3p)"=PC(js) && CardPC (4p) <=CardPC (js) 
if sum(Binary (jp, :)<=Binary(js,:))==CardTheta 
delta=[delta 1]; else, delta=[delta 0]; 
end 
end 
end 
end 
end 
if isempty (delta) 
KeepIndex (j,1) 
else 
KeepIndex (j,1)=1-prod (delta) ; 
end 


= 
1; 


end 
KeepIndex=KeepIndex’ ; 
for i=1:NbrSources 

if KeepIndex (i 


)==1, KeepIndex(i)=masseConj(i); end 


end 
UQ=unique (PC) ; Proport ions=0+*UQ;DenPCR5=0; 
for u=1:size (UQ, 2) 


SameP roposit ions=find (PC==UQ(u)); 
MassProd=prod (KeepIndex (SamePropositions) ); 
Proportions (u)= MassProd*massCon j; DenPCR5=DenPCR5+MassProd; 
end 
Proport ions=Proport ions/DenPCRS5; 
for u=1:size(UQ,2),mPCRS5plus (UQ(u) 
end 


) =mPCR5p1lus (UQ(u)) +Proportions(u); end 


Theo Dezert & Jean Dezert 
mS]= Matrix of BBAs to combine with PCR6+ 
2 ,mS) fusion result 


% Authors and copyrights: 
BBA=[ml m2 ... 
mPCR6plus is PCR6+ 


size (BBA, 2) ; CardTheta=log2 (size (BBA, 1) +1); 
if (NbrSources==1), mPCR6plus=BBA(:,1);return, end 


mPCR6plus=zeros (size (BBA, 1),1);FocalElem = cell (NbrSources, 1); 
for i=1:NbrSources, FocalElem{i}=find(BBA(:,i)> 0)’;end 
Combinat ions=combvec (FocalElem{1:NbrSources})’; 
for c=1:size (Combinations, 1) 
PC=Combinations (c, :) ;masseCon j=diag (BBA(PC,:))'; 
massCon j=prod (diag (BBA (PC, :))’,2);Intersections=PC (1); 


for s=2:NbrSources, Intersections=bitand (Intersections, PC(s));end 
if (Intersections ~=0) 
mPCR6plus (Intersections) =mPCR6éplus (Intersections) +massConj; 
else 
Binary=[];CardPC=[];KeepIndex=[]; 
for i=1:NbrSources 
Binary (i,:)=bitget (PC(i),CardTheta:-1:1,’int8’); 
CardPC (i, :)=sum (Binary (i, :)==1); 
end 
for j=1:NbrSources 
delta=[]; 
for js=1:NbrSources 
if CardPC (js) >=CardPC (3) 
for jp=1:NbrSources 
if PC(jp) “=PC(js) && CardPC (jp) <=CardPC (js) 
if sum(Binary (jp, :)<=Binary(js,:))==CardTheta 
delta=[delta 1]; else, delta=[delta 0]; 
end 
end 
end 
end 
end 
if isempty (delta) ==1 
KeepIndex (j,1)=1; 
else 
KeepIndex (j, 1) =1-prod (delta) ; 
end 
end 
KeepIndex=KeepIndex’ ; IgnoringSet OfFE=find (KeepIndex==0) ; 
masseConj (IgnoringSetOfFE) =[] ;PC (IgnoringSetOfFE) =[]; 


for s=1:numel (masseConj) 
Proportion= masseConj(s) * (massConj/(sum(masseConj,2))); 
mPCR6plus (PC (s) ) =mPCR6plus (PC (s) ) +Proportion; 
end 
end 
end 
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Abstract—This short paper presents the explicit formulas of 
the PCR5 and PCR6 rules of combination for three bayesian 
basic belief assignments. We give a simple example to show how 
to apply them. 
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I. INTRODUCTION 


Among many existing rules of combination of Basic Belief 
Assignments (BBAs), the conjunctive rule, Dempster-Shafer 
(DS) rule [1], and the Proportional Conflict Redistrinution 
rules no 5 (PCR5) and no 6 (PCR6) are the most used rules of 
combination. While the conjunctive rule makes it possible to 
combine information between different sources of information 
represented by belief functions by estimating the level of 
existing conflict, DS rule [1], [2] proposes a distribution of 
this conflict on the hypotheses characterized by the sources 
of information. The normalization carried out by the DS rule 
may however be considered counter-intuitive especially when 
the level of conflict between the sources of information is high 
[3], [4], but also in some situations where the level of conflict 
between sources is low as shown in [5] showing a dictatorial 
behavior of DS rule. The Proportional Conflict Redistribution 
rules (PCR5 [6] and PCR6 [7], [8]) have been proposed 
to circumvent the problem of the DS rule to make a more 
judicious management of the conflict. Moreover, improved 
versions of PCR5 and PCR6 rules preserving the neutrality 
of a vacuous (i.e. a totally ignorant) source of evidence in 
the PCR process have been recently proposed in [9]. They 
are denoted by PCR5* and PCR6* fusion rules. We will 
not present in detail these improved rules here because we 
address the problem of fusing only Bayesian BBAs and for 
these particular type of BBAs PCR5* coincides with PCR5, 
and PCR6* coincides with PCR6 because there is no mass 
committed to partial and to total ignorances (i.e. to all possible 
disjunctions) involved in partial conflict to redistribute thanks 
to PCR5 and PCR6 principles. 

After a brief recall of basics of belief functions in section II, 
we present the general formulas for PCR5 and PCR6 fusion 
rules in section III based on [9], with a simple example in 
section IV. In section V we present the direct formulas for 
PCR5 and PCR6 rules for three general (i.e. non-Bayesian) 


BBAs, and the direct formulas for PCR5 and PCR6 rules 
for three Bayesian BBAs in section VI. A simple example of 
application of these formulas for the fusion of three Bayesian 
BBAs defined on the simple frame of discernment with two 
elements is given in section VII with complete calculation for 
convenience. Section VIII concludes this paper. 


II. BASICS OF BELIEF FUNCTIONS 


We consider a given finite set O of n > 1 distinct elements 
© = {01,62,...,0,} corresponding to the frame of discern- 
ment (FoD) of the fusion problem, or the decision-making 
problem, under concern. All elements of © are mutually 
exclusive! and each element is an elementary choice of the 
potential decision to take. The power set of © is the set 
of all subsets of © (including empty set ( and ©) and it 
is usually denoted 2° because its cardinality equals 2!°!. 
A Basic Belief Assignment (BBA) given by a source of 
evidence is defined by Shafer [1] in his Mathematical Theory 
of Evidence (known also as Dempster-Shafer Theory, or DST) 
as m(-) : 2° — [0,1] satisfying 


‘ee =0, a 


SpaesS m(A) =1, 


where m(A) is the mass of belief exactly committed to A, 
what we usually call the mass of A. A BBA is said proper 
(or normal) if it satisfies Shafer’s definition (1). The subset 
A C @ is called a Focal Element (FE) of the BBA m/(-) if 
and only if m(A) > 0. The empty set is not a focal element 
of a BBA because m()) = 0 according to definition (1). The 
set of all focal elements of a BBA m/(-) is denoted F(m). Its 
mathematical definition is F(m) = {X € 2°|m(X) > O}. 
The cardinality |F(m)| of the set F(m) is denoted F,,. The 
order of focal elements of F(m) does not matter and all the 
focal elements are different. The set #(m) of focal elements 
of m(-) has at least one focal element, and at most 2!°! — 1 
focal elements. 


This standard assumption is called Shafer’s model of FoD in DSmT 
(Dezert-S marandache Theory) framework [10]. 
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Belief and plausibility functions are respectively defined 


from m(-) by [1] 
» 


X€2°|XCA 


Bel(A) = m(X), (2) 


and 
m(X) = 1—Bel(A). (3) 


oy 


XE29|ANX AO 
where A represents the complement of A in 0. 


Bel(A) and PI(A) are usually interpreted respectively as 
lower and upper bounds of an unknown (subjective) probabil- 
ity measure P(A) [11], [12]. The functions m(.), Bel(.) and 
PI(.) are one-to-one. A belief function Bel(.) is Bayesian if 
all Bel’s focal elements are singletons [1] (Theorem 2.8 p. 
45). In this case, m(X) = Bel(X) for any (singleton) focal 
element X, and m/(.) is called a Bayesian BBA. Corresponding 
Bel(-) function is equal to Pl(-) and these functions can be 
interpreted as a same (possibly subjective) probability measure 
P(-). The vacuous BBA (VBBA for short) representing a 
totally ignorant source is defined as m,(O) = 1. 


III. PCR5 AND PCR6 RULES OF COMBINATION 
A. The PCRS rule of combination 


The PCRS rule [6] transfers the conflicting mass to all the 
elements involved in this conflict and proportionally to their 
individual masses, so that a more sophisticate and specific 
distribution is done with the PCR5 fusion process with respect 
to other existing rules (including Dempster’s rule). The PCR5 
rule is presented in details (with justification and examples) 
in [10], Vol. 2 and Vol. 3. 

A simple formulation of the general expression of the 
PCR5 fusion of S > 2 basic belief assignments is obtained 
by redistributing each conflicting product defined by 


1; (0) = 1; (X; N...N Xs = 0) = [] mi(X;,), (4) 
to some elements of the power set of the FoD that are involved 
in the conflict X;,9...9Xj;, =. Each 7; (0) is redistributed 
proportionally to elements involved in this conflict based on 
the PCR5 redistribution principle. When an element A € 2° 
is not involved in a conflicting product 7;(0), ie. A ¢ Xj, 
the conflicting product 7;(@) is not redistributed to A. If an 
element A is involved in the conflict Xj,9...9Xj, =9, ie. 
A € X; and 7;(0) occur, then the proportional redistribution 
of 7; (0) to A is given by 


BAA ( 


i€{1,...,S}|Xj,=A 
75 (0) 


XEX; i€{1,...,S}|Xj,=X 


where A € X,; means that at least one component of the S- 
tuple X; equals A, with 


Xj 4 (Xj, Xjq,---)Xig) € Flrm,...,ms), 
where 
e fi € {1,2,..., Fim} 
e jo € {1,2,..., Fimo} 
e js € {1,2,..., Fig}, 
e and 
F(m,...,mg) = F(m) x F(mz) x... x F(ms), 


S S 
F &|F(m1,...,ms)| = [] |For) = [] Fn. 
t=1 w=1 


The element X;, is the focal element of m,(-) that makes the 
i-th component of the j-th S-tuple X;. 
The mass of A obtained by the PCRS rule is 


PCR5 Conj 


m9 (A) = my9)5(A) 


+ os x;(A), (6) 
JE{1,..., F}| ACK Arr; (0) 


where A € X; A 7;(@) is a shorthand notation meaning that 
at least one component of the S-tuple X,; equals A and the 
components of X, are conflicting, ie. Xj,9...N Xj, =0. 


The 


PCRS 
M19 


general PCRS formula can be 
g(0) = 0, and for A € 2® \ {0} by 


expressed as 


akaty 


ie{1 sy S}[X5,=A 


773(0) 


ie{1 ae) S}|X5,=X 


mi(X;j,)) | 


XEX; 


where my) s(A) is the mass of A obtained by the 


conjunctive rule, that is 


> 


X;E€F(m1,...,.mg) 
X519...XZg=A 


15 (Xj, XIN... X; ) 


pees 


[[mi.)- (8) 
X;E€F(my,...,.mg) = 1 
X51 9...AXZG=A 


The total conflicting mass between the S' sources of evidence, 
denoted m{% <(@), is nothing but the sum of all existing 


conflicting mass products, that is 


> 


X;E€F(m1,...,.mg) 
X41 9..0X55=0 


=1-— > mit’ _s(A). (9) 
AE2©\ {HO} 


Conj 


mios(0) = 73 (X5, X_N... NXg) 
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‘Seats 


a proper BBA because it does not satisfy Shafer’s definition 
(1). In general the S sources of evidence to combine do not 


yeeey 


B. The PCR6 rule of combination 


A variant of PCR5 rule, called PCR6 rule, has been pro- 
posed by Martin and Osswald in [7], [8] for combining S > 2 
sources. The difference between PCR5 and PCR6 lies in the 
way the proportional conflict redistribution is done as soon as 
three (or more) sources are involved in the fusion. The PCR6 
fusion of S > 2 BBAs is obtained by m{S°° 5 (0) = 0, and 
for all A € 2° \ {0} by? _ 


sees 


> en 


(10) 


XEX; i€{1,...,S}|Xj,=X 


The difference between the general PCR5 and PCR6 
formulas is that the PCR5 proportional redistribution in- 
volves the products I] m,;(X,;,) of multiple same 

i€{1,...,S}|Xj,= 
focal elements A (if any) in the conflict, whereas the 
PCR6 conflict redistribution principle works with their sum 
m,;(X;,) instead. 


PCR6 coincides with PCR5 when one combines two sources 
of evidence. 


IV. SIMPLE EXAMPLE OF PCR5 AND PCR6 FUSION RULES 


Here we provide a simple example showing the difference 
of the results between PCR5 and PCR6 rules. This example 
has been already presented in [9]. For convenience, all nu- 
merical values have been rounded to six decimal places when 
necessary. 


Example 1: We consider the simplest FoD 0 = {A, B}, and 
the three following BBAs 


m,(A) = 0.6, m,(B) = 0.1,m1(A U B) = 0.3, 
m2(A) = 0.5, m2(B) = 0.3, m2(A U B) = 0.2, 


Because Fm, = |F(m1)| = 3, Fm, = |F(mz2)| = 3 and 
Fing = |F(ms3)| = 3, we have F = Fim, -Fimy'Fmy = 27 non- 
zero products to consider. Fifteen products are non-conflicting 
and enter in the calculation of my9'3(A), my 9'3(B) and 
iss (AU B), and twelve products are conflicting products 


2We wrote this PCR6 general formula in the style of PCR5 formula (7). 
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that need to be proportionally redistributed. The conjunctive 
combination of these three BBAs is 


m{'3'3(A) = m1(A)m2(A)ms(A) 
+m (A)m2(A)m3(A U B) 
+m1(A)m2(AU B)ms 


( 

( A 
+m1(A)m2(AU B 

( A 

( 


my)3(B) = mi(B)m2(B)ms 


+m, 


+m 
mi y)3(A UB) =m1(AU B)m2(AU B)m3(AU B 
= 0.3 - 0.2 - 0.5 = 0.0300, 


and the total conflict between these three BBAs is given by 
miyi23(0) = 1 — my!3(A) — mi'!3(B) — my)3(A U B) 
= 0.3430. 


In this example we have twelve partial conflicts, noted 7; (0) 
(j =1,...,12), which correspond to the following products 


771(0) = m1 (A)me2(A)ms3(B) = 0.0300, 
72(0) = mi(A)m2(B)m3(A) = 0.0720, 
73(0) = mi(B)m2(A)m3(A) = 0.0200, 
ma(0) = mi (B)m2(B)m3(A) = 0.0120, 
75(0) = mi(B)m2(A)m3(B) = 0.0050, 
16(0) = mi(A)me2(B)m3(B) = 0.0180, 
1™7(0) = mi(A U B)m2(A)m3(B) = 0.0150, 
m™3(0) = m1(AU B)m2(B)ms3(A) = 0.0360, 
19(0) = mi(B)m2(A)m3(A U B) = 0.0250, 
™10(0) = m1(A)m2(B)m3(A U B) = 0.0900, 
7711(0) = m1(A)m2(A U B)ms(B) = 0.0120, 
7™12(0) = mi(B)m2(A U B)m3(A) = 0.0080. 


In applying the PCR5 formula (7), and the PCR6 formula 
(10) we obtain finally m{S3(0) = m{S'$(0) = 0, and? 
mi33(A) 0.723281, 
miZ3(B) ¥ 0.182460, 
mi93(A U B) = 0.094259, 
and 
mi 3'3(A) ¥ 0.743496, 
mi93(B) © 0.162245, 
miG3(A U B) = 0.094259. 


We see a difference between the BBAs m{S%3 and m{S'S 
which is normal because the PCR principles are quite different. 


3The symbol ~ means “approximately equal to”. 
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Using the PCR5 fusion rule the first partial conflicting mass 
7™71(0) = mi(A)m2(A)m3(B) = 0.03 is redistributed back to 
A and B proportionally to m(A)m2(A) and to m3(B) as 
follows 


wi(A) _ i(B) _ (0) 
My (A)ma (A) m3 (B) My (A)ma2 A) + m3 (B) ; 
whence 
_ _mi(A)m2(A)m (0) 
a1(A) = am(Ayma(Ay tims By = 0.0225 
(By = (Bm) 
= Aaa tme) 


We can verify 71(0) = 21(A) + 21(B) = 0.03. 


Using the PCR6 fusion rule the first partial conflicting mass 
71(0) = 0.03 is redistributed back to A and B proportionally 
to (m1(A) + m2(A)) and to m3(B). So we get the following 
redistributions 71(A) = 0.0275 for A and x7(B) = 0.0025 
for B because 


_— ti(A) 2 (B) 7™(0) 
mi(A)+m2(A) = m3(B) m4 (A) + me(A) + m3(B) 
whence 
" _ _(mi(A) + ma(A))m (0) 
i(A) = Pia) Aaa tae BY * mma(B) 0.0275, 
_ m3(B)m (0) _ 
#(B) = Say gmat A) tml) = 0.00%: 


We can verify 71(0) = 11(A) + 21(B) = 0.03. 


Note that for all the partial conflicts having no duplicate 
element involved in the conflicting product 7;(0) we make 
the same redistribution with PCR5 rule and with PCR6 rule. 
For instance, for 77(0) = m1(AU B)m2(A)m3(B) = 0.0150 
we get 


_ 17(0) 
_ mi(A U B) “ m2(A) + m3(B)’ 


whence 77(@) = v7(AUB)+27(A)+27(B) = 0.0150, with 


= m2 A)17(0) a5 
or A) = = (AUB) + tmia(A) + ma(B) © 0088: 
= m3, B)r7(0) xy 
2B) aR) met 


V. PCR5 AND PCR6 RULES FOR THREE BBAS 


The previous general formulas of PCR5 (7) and PCR6 (10) 
can be written more explicitly for the fusion of three BBAs 


as follows (see [13] for details’) when working with a FoD 0 
with Shafer’s model. 
(A) = mi3'3(A) 


PCR5 


™1,2,3 
m1(A)?m2(X)m3(Y) 
ie a Exey + ma(X) + ma(Y) 
fle 
ah: ma(Y)m2(A)?m3(X) 
imi (V) + ma(A) + ms(X) 
ma(X)ma(¥)ms(A)? 
imi (X) + ma(Y) + me(A) 
m1(A)?m2(X)m3(X) 
- > Ex A) + ma(Xyma(X) 
ANX=0 (11) 
es (X)m2(A)?m3(X) 
mi(X)m3(X) + m2(A) 
x, mi(X)m2(X)m3 (A)? 
mi(X)m2(X) + m3(A) 
rm1(A)*m2(A)?m3(X) 
= o, Lima (A)ma(A) + ma(X) 
ANX=0 
 m(X)me2 A)?m3(A)* 
ima (X) + ma(A)ms(A) 
im (A)?me (X)m3 (A)? 
M1 (A)ms A) =P mo(X 
and 
my2.3(A) = my'9'3(A) 
m1(A)?m2(X)m3(Y) 
- an lasts + ma(X) + ma(Y) 
AA APL Kay 
mY )me (A)?m3(X) 
mi(Y) + ma(A) + ma(X) 
ma (X)ma(¥)ms(A)? 
mi (X) + ma(V) + ms(A) 
M1 (A)?me (X)m3 (XxX) 
Ds, Lima (A) + ma(X) + ma(X) 
Anx=0 
 mai(X)me (A)?m3(X) 
imi (X) + ma(A) + ma(X) 
mai (X)me2(X)m3(A 2 
imi (X) + ma(X) + me(A) 
f (m1 (A) + m2(A))ma (A)ma(A)ma(X) 
Pod ma (A) + ma(A) + ma(X) 
ANX=0 
_ (ma(A) + ma(A))ma (X)ma(A)ma(A) 


ma(X) + me2(A) + ms(A) 
(m1 (A) + ma(A))ma(A)ma(X)ma(A) 
mi(A) + ma(X) + ms3(A) 


wa 


| 


(12) 


It is worth mentioning that if some fractions involved in the 
formulas (11) and (12) have their denominators equal to zero, 


“It is worth mentioning that PCR5 for three BBAs given in the section 2 
of [13] is incorrect, and it must be replaced by formula (11) of this paper. 
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these fractions are just discarded. It can be easily verified on 
example 1 that PCR5 formula (11) gives the same result as 
with the formula (7), and that the PCR6 formula (12) gives 
the same result as with the formula (10). 


VI. PCR5 AND PCR6 RULES FOR THREE BAYESIAN BBAS 


If we want to work with three Bayesian BBAs only, the 
focal elements of BBAs to combine are only singletons of the 
power set 2°. In this particular case, the previous PCR5 and 
PCR6 formulas (11) and (12) can be simplified as 


miy2,3(A) = mi (A)m2(A)ms(A) 


+ 
X,YEe@\{A} 
XY 


te Dy 
XEO\{A} (13) 


my2.3(A) = mi")s(A) 


+ 
m 
X,YEO\{A} 
X4ZY 


ee 


XEO\{A} 


eS 


Pee mi(A) + m2(A) + m3(X) 

_, (mi(A) +:ms(A))mi(A)ma(X)ma(A) 

mi(A) + ms(A) + m2(X) 

_ (m2(A) +ma(A))mi(X)me Aims(4)) 
mi(X) + m2(A) + ms3(A) 


(14) 


In the formulas (13) and (14) the subset A is any singleton 
of 2° (ie. any element of ©). For any non-singleton A of 
2° we have m{G'3(A) = 0 because the fusion of Bayesian 
BBAs by PCR5 and PCR6 always produces a Bayesian BBA. 
It is worth mentioning that if some fractions involved in the 
formulas (13) and (14) have their denominators equal to zero, 
these fractions are just discarded. 


VII. EXAMPLE OF PCR5 AND PCR6 FUSION OF THREE 
BAYESIAN BBAS 


Here we provide a simple example showing the differ- 
ence of the results between PCR5 and PCR6 rules for three 
Bayesian BBAs. For convenience, all numerical values have 
been rounded to six decimal places when necessary. 


Example 2: We consider the simplest FoD 0 = {A, B}, and 
the three following BBAs 


m (A) = 0.2, m,(B) _ 0.8, m1(A U B) — 0, 

mg(A) _ 0.1, m2(B) = 0.9, m2(A U B) = 0, 
Because F,,, = |F(m1)| = 2, Fm, = |F(me2)| = 2 and 
Fins — |F (m3) = 2, we have = Fm ° 4 m2* m3 = 
8 non-zero products to consider. Two non-zero products are 
non-conflicting and enter in the calculation of my‘5’(A) and 


Conj ; de 
my'5'3(B), and six non-zero products are conflicting products 


that need to be proportionally redistributed. The conjunctive 
combination of these three Bayesian BBAs is 


Conj 


my9'3(A) = m1(A)m2(A)ms(A) = 0.012, 
my )'3(B) = mi(B)m2(B)ma(B) = 0.288, 
m9 3(A UB) =mi(AU B)m2(AU B)m3(A U B) = 0, 
and the total conflict between these three BBAs is 
mis)3(0) = 1 — my7'g(A) — mi9's(B) — mi),(A U B) 
= 0.70. 


In this example we have six partial conflicts, noted 7; (0) 
(j =1,...,6), which are given by the following products 


m2(0) = m1(A)me2(B)m3(A) = 0.108, 
13(0) = mi (B)m2(A)m3(A) = 0.048, 
ma(0) = mi(B)m2(B)m3(A) = 0.432, 
m5(0) = mi(B)m2(A)m3(B) = 0.032, 
m6(0) = mi (A)me2(B)m3(B) = 0.072 


A. PCRS fusion of the three Bayesian BBAs of example 2 
Using the general PCR5 fusion rule (7) with S = 3 (ie. 
3 BBAs) we manage the the conflicting mass products as 
follows: 
e Conflicting mass 71(0) = m1(A)m2(A)m3(B) = 0.008 
is redistributed back to A and B proportionally to 
m,(A)m2(A) and to m3(B) as follows 


Vy (A) 
My (A)ma2 (A) 
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whence 
_ _mi(A)m2(A)m(0) 
(A) = = Cayma(Ay 4 nal B) © 0.000881, 
(py. __ms(Bym 0) 
(B) = ST ahane( Ay mab © 0.007619, 
We can verify 71(@) = 71(A) + 21(B) = 0.008. 


e Conflicting mass 72(0) = 


m1 (A)m2(B)m3(A) = 0.108 


is redistributed back to A ee B proportionally to 


m,(A)m3(A) and to m2(B) as follows 
_ @2(A) 2B) _ 72(0) 
mi(A)m3(A) — m2(B) mi (A)m3(A) + ma(B)’ 
whence 
(Ayo SAA), 
2(A) = miata 
r = m2(B)12(0) 2 
AB)= eas ee 


e Conflicting mass 73(0) 


is redistributed back to A and B proportionally to 


m2(A)m3(A) and to m1(B) as follows 
__a{A) __ %a(B) _ 73(0) 
m2(A)m3(A) — mi(B) — m2(A)m3(A) +1m1(B)’ 
whence 
vee( Ay — _na(Adms(A)rs() 
3(A) Ana 
: _ mi(B)13 0) oo 
x3(B) ma Ajmal + rm By © 9:044681. 


e Conflicting mass 74(0) 
is redistributed back to A and 
to m,(B)m2(B) as follows 


B proportionally to m3(A) and 


ra(A) _a(B) (0) 
ms3(A) mi (B)m2(B) — ms3(A) + mi(B)m2(B)’ 
whence 
- ms(A)ma(0) eS 
aa(A) = Sa WES OH EIT Ei 0.196364, 
ss _ _ 7m (B)m2(B)n(0) 9 995 
4(B) = Ay Praea) 


e Conflicting mass 75 (0) 
is redistributed back to A and 
to m1(B)m3(B) as follows 


B proportionally to m2(A) and 


<5 (A) _ r5(B) _ 75 (0) 
m2(A) mi(B)m3(B) — m2(A) + mi (B)ma(B)’ 
whence 
= me2(A)75(0) ae 
x5(A) = mag RETR EZ) POV EZ) = 0.007620, 
Z _ _mi(B)ms(B)ms5(0) 
(B) mms) 


e Conflicting mass 7¢(@) = m1(A)m2(B)m3(B) = 0.072 
is redistributed back to A and B proportionally to m (A) and 
to m2(B)m3(B) as follows 


we(A) _ _m6(B) m6 (0) 
mi(A) — m2(B)m3(B) — mi(A) + m2(B)m3(B)’ 
whence 
(Ay = mutA) 5 ope 
6(A) = Fly Bae) 0.025714, 
_ _m2(B)ms(B)r6(0) 
o(B) = "Cay + mua(Byna( B) © 0046286. 


Therefore in applying PCR5 formula we get 


miG3(A) = my3"3(A) + 21(A) + v2(A) + #3(A) 

+ x4(A) ae a5(A) + x6(A) y 0.258134, 
miS's(B) = m73"3(B) + «1(B) + #2(B) + 3(B) 

—- x4(B) ae x5(B) + xe(B) y 0.741866, 


and because the result is a Bayesian BBA we have also 
mio3(AU B) = 0. 


Now if we apply the PCR5 combination of the three 
Bayesian BBAs of example 2 using the direct formula (13), 
we have to work with 0 = {A, B}. So, for the focal element 
A we must consider all X € © \ {A} in the second and third 
summations but 0 \ {A} = {B}, hence X = B only. In the 
first summation there is no X,Y € O\ {A} such that X 4 Y, 
so the first summation does not exist for this example. For 
the focal element A, the formula (13) reduces to the simple 
expression 


PCR5 


m,2,3(A) = mi(A)ma(A)ma(A) 


m3) ,(A)=0.012 
m(A)?m2(B)m3(B) 
m1(A) + m2(B)ma(B) 
xg(A)0.025714 
ma (B)m2(A)?m3(B) 
m1 (B)ma(B) + m2(A) 
a5 (A) 0.007620 
m1(B)m2(B)m3(A)? 
m1(B)m2(B) + ma(A) 
24(A)%0.196364 (15) 
m(A)?m2(A)?m3(B) 
m(A)ma(A) + ma(B) 

21 (A)®0.000381 
m(A)?m2(B)ms3(A)? 
mi(A)msa(A) + mo(B) 

a@(A)%0.012706 
4, ma(B)ma(A)?ms(A)’ 
mi(B) + m2(A)ms3(A) 
23 (A)®0.003349 
= 0.258134 
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Similarly, for the example 2 and using the direct PCR5 
formula (13) for three bayesian BBAs we have for the focal 
element B 


PCR5 
™} 2,3 


(B) = mi(B)m2(B)ma(B) 
mo?) 5 (B)=0.288 
M1 (B)?m2(A)m3(A) 
mi(B) + m2(A)ms(A) 
23(B)®0.044651 
m(A)m2(B)?m3(A) 
mi(A)ms(A) + mo(B) 
29(B)®0.095294 
mi(A)m2(A) + ms(B) 
21(B)®0.007619 
m(B)?m2(B)?m3(A) 
ma (B)ma(B) + ma(A) 


(16) 
+ 


a4 (B)%0.235636 
m(B)?m2(A)m3(B)? 
ma(B)ms(B) + ma(A) 
mi(A)m2(B)?m3(B)? 
ma(A) + m2(B)ms3(B) 

ag (B)*0.046286 
= 0.741866 


It is clear that the results obtained with the direct formula 
(13) are in agreement with those obtained by the general PCR5 
formula (7) when S = 3. 


B. PCR6 fusion of the three Bayesian BBAs of example 2 


Using the general PCR6 fusion rule (10) with S = 3 (ie. 
3 BBAs) we manage the the conflicting mass products as 
follows: 

e Conflicting mass 71(0) = m1(A)m2(A)m3(B) = 0.008 
is redistributed back to A and B proportionally to m (A) + 
mg2(A) and to m3(B) as follows 


£1(A) _ 2(B) _ 7™ (0) 
mi(A) +m2(A) —ms3(B) — m1(A) + mo(A) + ms(B) 
whence 
_ _(mi(A) +me2(A))m 0) 
(A Ay ea (By © 0: 008429, 
= ms3(B)m(0) s 
1(B) = ay eed) 4 (By © 000871. 


We can verify 71(@) = 71(A) + 21(B) = 0.008. 


e Conflicting mass 72()) = m1(A)m2(B)m3(A) = 0.108 
is redistributed back to A and B proportionally to m (A) + 
m3(A) and to m2(B) as follows 


Xv (A) 
m(A) + ms3(A) 


whence 


+ m2 (B) 


B) = 0.050824, 


~ 0.057176. 


e Conflicting mass 73(0) = m1(B)m2(A)m3(A) = 0.048 
is redistributed back to A and B proportionally to m2(A) + 
m3(A) and to m,(B) as follows 


£3(A) 2) r3(B) = 
m(A)+m3(A) mi (B) 
whence 
_ _(™m2(A) + m3(A))73(0) 
(8) ma(A) + mata as a 
= ma1(B)73(0) 
73(B) = 5A) + ma(A) +m (B) 


73(0) 


m2(A) + ms3(A) + m1 (B)’ 


= 0.022400 
B) : 


= 0.02 : 
3 0.025600 


e Conflicting mass 74(0) = m1(B)m2(B)m3(A) = 0.432 
is redistributed back to A and B proportionally to m3(A) and 
to m,(B) + m2(B) as follows 


za(A) _ 


ms3(A) 


whence 


r4(B) 


mi(B)-+m2(B) 


ms(A)r4(0) 


ms3(A) 4 
(mi(B 


+ mi(B) 4 
) + m2(B 


+ mi(B) + m2(B 


74(0) 


ms(A) + mi(B) + m2(B)’ 


~ 0.112 
B) 0 696, 


j & 0.319304. 


e Conflicting mass 75(0) = m1(B)m2(A)m3(B) = 0.032 
is redistributed back to A and B proportionally to m2(A) and 
to m,(B) + m3(B) as follows 


m5(A) _ 
ms (A) 


whence 
x5(A) = 


x5(B) — 


5(B) 


m,(B) + ms3(B) 


m2(A)75(0 
+ mi(B) + m3(B) 


m2(A) FT 
(mi(B 
m2(A) al 


) +m3(B 


eS 


+ mi(B) + m3(B) 


~ 0.002462, 


= 0.029538. 


e Conflicting mass 7¢(@) = m1(A)m2(B)m3(B) = 0.072 
is redistributed back to A and B proportionally to m (A) and 
to m2(B) + m3(B) as follows 


ae(A) _ 
My (A) 


whence 
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r6(B) 


j ~ 0.009600, 


= 0.062400. 
B) 0.062400 
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Therefore in applying PCR6 formula (10) with S = 3 we get 
finally 


miS(A) = m9'3(A) + 21(A) + 22(A) + 23(A) 
+ a4(A) + v5(A) + 26(A) ¥ 0.213411, 
miSRS(B) = m!,(B) + 21(B) + x2(B) + 23(B) 


12,3 
v4(B) + 25(B) + 26(B) © 0.786589, 


and because the result is a Bayesian BBA we have also 


mig3(AU B) =0. 


When using the direct formula (14) of PCR6 rule for the 
three bayesian BBAs of example 2 we obtain for the focal 
element A 


mi9.3(A) = mi (A)ma(A)ma(A) 
m5) 3(A)=0.012 
m(A)?m2(B)m3(B) 
mi(A) + mo(B) + ma(B) 
2g (A) 0.009600 
My, (B)m2(A)?m3(B) 
mi(B) + ms3(B) + m2(A) 
a5 (A) 0.002462 
My (B)m2(B)m3(A)? 
mi(B) + ma(B) + ma(A) 
x4 (A)x0.112696 
es (mi(A) + m2(A))m1(A)m2(A)ms(B) 
ma (A) + ma(A) + ma(B) 
1 (A)#0.003429 
|, (mA) + ma(A) rma (A)ra(B)m3(A) 
mi(A) + m3(A) + ma(B) 
9 (A)#0.050824 
(ma(A) + m3(A))mi(B)ma2(A)ms(A) 
mi(B) + ma(A) + ms(A) 


x3 (A)%0.022400 


(17) 


+ 


= 0.213411 


Similarly, for the example 2 and using the direct formula 


(14) we have for the focal element B 
mi2'3(B) = mi(B)m2(B)ms(B) 
m3) 3 (B)=0.288 
My (B)?m2(A)ms (A) 
mi(B) + ma(A) + ms(A) 
3 (B)~0.025600 
m1(A)m2(B)?m3(A) 
M1 (A) +m3 (A) + m2(B) 
22(B)%0.057176 
m4 (A)m2(A)m3(B)? 
mi(A) + me2(A) + ma(B) 
21 (B)%0.004571 
g_ [ eB) mal BY (Ba Bs( A) 
ma (B) + ma(B) + ma(A) 
24(B)%0.319304 
(ma(B) + ma(B))mi(B)m2(A)ms(B) 
ma (B) + ma(B) + ma(A) 
a5 (B)X0.029538 
(mo(B) + ma(B))mi(A)mo(B)ms(B) 
ma (A) + ma(B) + ma(B) 


26 (B)0.062400 


+ 


= 0.786589 
(18) 


It is clear that the results obtained with the direct formula 
(14) are in agreement with those obtained by the general PCR6 
formula (10) when S = 3. 


VIII. CONCLUSION 


In this paper we have developed explicit formulas for the 
PCR5 and PCR6 fusion of three bayesian BBAs which work 
with any cardinality of the frame of discernment greater or 
equal to two. We have verified that our formulas are coherent 
with general PCR5 and PCR6 formulas. We have also provided 
the correct PCR5 formula for three BBAs which was erroneous 
in our 2010 original paper [13]. We hope that these formulas 
will be helpful for some users of belief functions working only 
with bayesian belief masses and with only three sources of 
evidence to combine because these direct formulas are much 
easier to implement than general PCR5 and PCR6 formulas. 
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Abstract—3D Building change detection has become a popular 
research topic along with the improvement of image quality 
and computer science. When only building changes are of 
interest, both the multi-temporal images and Digital Surface 
Models provide valuable but not comprehensive information in 
the change detection procedure. Therefore, in this paper, belief 
functions have been adopted for fusing information from these 
two sources. In the first step, two change indicators are proposed 
by focusing on building changes. Both indicators have been 
projected to a sigmoid curve, in which both the concordance 
and discordance indexes are considered. In order to fuse the 
concordance and discordance indexes and further fuse the two 
change indicators, two belief functions are considered. One is 
the original Dempster-Shafer Theory (DST), and the most recent 
one is Dezert-Smarandache Theory (DSmT). This paper shows 
how these belief-based frameworks can help in building change 
detection problem. Besides using different belief functions in 
obtaining the global BBAs, four decision-making criteria are 
tested to extract final building change masks. The results have 
been validated by compared to the manually extracted change 
reference mask. 


Keywords: belief functions, DSmT, satellite imaging, building 
change detection. 


I. INTRODUCTION 


Accurate and efficient detection of changes is of great 
importance for urban monitoring, which is also an important 
research field in remote sensing. Change detection methods on 
large scale land cover monitoring have been intensively studied 
and reviewed [1], [2]. Along with the ascending of image 
spectral and temporal resolution, the expectation on automatic 
change detection has progressively increased, not only on 
results accuracy, but also on the efficiency and robustness of 
the methods. Moreover, change detection for a specific target 
of interest, like buildings is becoming an important research 
topic. In small scale 2D change detection, which is performed 
based on only 2D multi-temporal spectral images, problems 
arise due to misdetections caused by irrelevant changes. The 
influence of these irrelevant changes is growing as higher 
resolution images showing more details. Therefore, in this 
paper, we will further work on satellite multispectral and 
stereo images, which provides both spectral and height change 
information. 


Adopting satellite stereo imagery for 3D change detection 
is an exciting and challenging task. Benefiting from improved 
data quality and advanced computer vision technique, the 
quality of the generated Digital Surface Models (DSMs) has 
been largely improved and it is possible to detect changes 
even for small objects, like single buildings. On the other 
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side, the DSMs may still exhibit some outliers resulting in 
occlusions within the stereo/multi views. Several approaches 
have been proposed for DSM assisted change detection [3], [4], 
[5], [6]. According to our previous research results, the belief 
functions introduced in DST allow to work more efficiently 
and robustly in urban building change detection with very 
high resolution satellite images [7]. So far, only a basic 
DS fusion model has been proposed in [6] to define the 
Basic Belief Assignments (BBAs) thanks to a sigmoid curve 
considering only the concordance index. Improvement of this 
DS fusion model for BBAs construction is proposed in this 
paper to achieve better performance by considering both the 
concordance and discordance indexes. Since DSmT [8] has 
been developed in last years as an interesting alternative to 
DST to circumvent problems of Dempster-Shafer’s (DS) rule 
of combination [9], we also investigate the possibility of using 
the Proportiobnal Conflict Redistribution Rule #6 (PCR6) of 
DSmT in our application. 


II. BASICS OF BELIEF FUNCTIONS 


Detailed presentations of DST and DSmT can be found 
in [8], [9] and [10]. Let © be a frame of discernment of a 
problem under consideration. 0 = {61,62,...,4x} consists 
of a list of N exhaustive and mutually exclusive elements 6;, 
i=1,2,...,N. Each 6; represents a possible state related to 
the problem we want to solve. The assumption of exhaustivity 
and mutual exclusivity of elements of O is classically referred 
as Shafer’s model of the frame O. A BBA also called a belief 
mass function (or just a mass for short), is a mapping m/(.) : 
2° —+ [0,1] from the power set! of © denoted 2° to [0,1], 
that verifies [10]: 


do m(X) = 1; (1) 


XE29° 


m(X) represents the mass of belief exactly committed to X. 
An element X € 2° is called a focal element if and only if 
m(X) > 0. In DST, the combination (fusion) of several inde- 
pendent sources of evidences is done with Dempster-Shafer” 
(DS) rule of combination, assuming that the sources are not 
in total conflict?. DS combination of two independent BBAs 
my,(.) and mg(.), denoted symbolically by DS(m1, mz), is 


'The power set is the set of all subsets of ©, empty set included. 

Although the rule has been proposed originally by Dempster, we call it 
Dempster-Shafer rule because it has been widely promoted by Shafer in DST. 

3otherwise DS rule is mathematically not defined because of 0/0 indeter- 
minacy. 
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defined by m?°(0) = 0, and for all X € 2® \ {0} by: 


1 
DS = 
uy (X) ~ 1 KDS » 


X1,X2€2° 
X{NXe=X 


m41(X1)mo(X2), (2) 


where the total degree of conflict K?* is given by 


KDS A S- 


X1,X2€2° 
X1NX2e=0 


my (X1)m2(X2). (3) 


A discussion on the validity of DS rule and its incompatibility 
with Bayes fusion rule for combining Bayesian BBAs can be 
found in [9], [11], [12]. To circumvent the problems of DS 
rule, Smarandache and Dezert (see [8], Vol. 2, Chap. 1), then 
Martin and Osswald (see [8], Vol. 2, Chap. 2) have developed 
in DSmT [8] two fusion rules called PCR5 and PCR6 based on 
the proportional conflict redistribution (PCR) principle which 
consists 


1) to apply the conjunctive rule; 

2) calculate the total or partial conflicting masses; 

3) then redistribute the (total or partial) conflicting mass 
proportionally on non-empty sets according to the 
integrity constraints one has for the frame O. 


This PCR principle transfers the conflicting mass only to the 
elements involved in the conflict and proportionally to their 
individual masses, so that the specificity of the information 
is not degraded. Because the proportional transfer can be 
done in two different ways, this has yielded to two different 
fusion rules. It has been proved in [13] that only PCR6 rule 
is compatible with frequentist probability estimation, and that 
is why we recommend its use in the applications. PCR5 and 
PCR6 rules simplify greatly and coincide for the combination 
of two sources. In this case, the PCR6 combination is obtained 
by taking m?°*6(Q) = 0, and for all X 4 0 in 2° by 


mPCRS(X) = Sm (X1)m2(X2)+ 
my (X)2ma(¥) m2(X)?mi(Y) 
[ He) 
oe my(X) +m2(¥) | m2(X) + mi(¥) 
XNY=0 


where all denominators in Eq. (4) are different from zero. If 
a denominator is zero, that fraction is discarded. 


III. BUILDING CHANGE DETECTION MODELS 
A. Choice of the frame of discernment 


We now use two sources (indicators) of evidences to solve 
our problem. As a preparation step, the indicators and focal 
elements have to be introduced. Two data sources are used for 
building change detection. One is the satellite images, which 
contain 2D spectral information. Here we use the Iteratively 
Reweighted Multivariate Alteration Detection (IRMAD) [14] 
to highlight changes from the spectral images. The other is 
the robust height difference which can be calculated from 
the two Digital Surface Models (DSMs) [6]. Detail of the 
DSM generation procedure and the characters of the DSMs 
quality have been described in [5]. As it has been explained 
in [6], we suppose that new, demolished or changed buildings 
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exhibit both height changes and spectral changes. The seasonal 
changes will only influence the spectral images. Therefore, 
for building change detection, we consider the following three 
classes (hypotheses) to define our frame of discernment satisfy- 
ing Shafer’s model: © = {0 = Pixel € BuildingChange,@2 = 
Pixel € OtherChange,@3 = Pixel € NoChange}. 


B. Sigmoidal model for BBA construction 


BBAs construction is a prerequisite for the combination of 
sources of evidence. In our previous works [6], the BBAs were 
built based on sigmoid curves related with the concordance 
index only. In this paper, we improve our model to construct 
the BBAs thanks to sigmoidal models for both concordance 
and discordance indexes following idea proposed in [15]. As 
explained in [6], the original sigmoid curve is defined as 


fir.r)(x) = 0.99/(1 +e" >"), (5) 


where z is the original value of each indicator. Two parameters 
T and 7 are used to control the symmetry point and the 
slope of the sigmoid function. The symmetry point indicates 
a certainty of 50%. The construction of BBAs is explained in 
[15] and adopted in this paper. In [15] these two parameters 
T and 7 are manually given to sigmoid curve. Here, the 
multi-level Otsu’s thresholding method [16] is used to get 
symmetry points for both concordance index and discordance 
index. Otsu’s algorithm defines that an image is composed of 
objects and background. A discriminant analysis is performed 
by minimizing the intra-class variance. When three classes are 
of interest, two threshold values are expected. Otsu’s method 
can be extended to 


o3,(T;,T2) = w104 (Ti, T2) 
+ w203(T), T2) + w303(T1,T2). (6) 


The weights w; are the probabilities obtained from the image 
histogram that are separated by the thresholds T\ and T>. o; 
are the variances of the three classes. T, and T> can be used 
as the symmetry points of discordance and concordance index 
respectively. Thus, using height change index as example, the 
BBAs for discordance and concordance height change index 
are presented as aay and bay 


aan = frm(AH), and ban = f-r,7,(AH). () 


The factor 7 is calculated with a sample value (AH = 1, 
aaH = 0.1), which means 1 meter height change indicates 
10% probability to be building changes. The BBAs for discor- 
dance and concordance image change index are built similarly. 
Differences appearing in 2D images give a concordance indi- 
cation for all changes, which include the building changes and 
other changes (6; U62). In this paper the changes from images 
are named Almg. 


C. BBAs construction using concordance and discordance 


The BBAs related with the concordance and discordance 
indexes are combined to get the global BBA related to each 
source of evidence. These global BBAs will then be used as 
input for solving the change detection problem thanks to their 
combination. In the Tables I and II, we present the two ways 
of construction of the BBAs of the sources of evidence based 
either on DS or on PCR6 rules of combination for the height 
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TABLE I. BBA CONSTRUCTION FOR HEIGHT CHANGE INDICATOR AH. [Kaw =aanbar] 
Focal Elem. my1(.) mi() mP () mi CRE) 
a (1-6 y a K 
es ; MEK aan(1 ae + antag 
2 
03 0 0 0 0 
01 U 02 0 0 0 0 
02 U 03 0 bap G-4an)bAH (1 —aan)b 4 banKan 
I-Kay HIOAH T anu FOAH 
6, U 02 U 03 l1—aau 1— ban ae rer (1 — aaw)(1 — ban) 
TABLE II. BBA CONSTRUCTION FOR IMAGE CHANGE INDICATOR Almg. [Katmg = @AImgbArmg| 
Focal Elem. ma.) ma) mP5() ms CRE) 
A 0 0 
05 0 0 0 
(l-aatmg)baIm bAImgKAIm 
03 0 baimg ce ere Z (1 — @atmg)bAtmg + aie Ne 
®@ATmgA1—baAImg) ®2ATmgKAImg 
01 U Og @Atmg 0 “= lak Ragas = @Atmg(1 — barmg) @ATwiG tPAT HS 
02 U 03 0 0 
G-@artmg)A-barmg) 
0,U02U03° 1—aartmg 1 -batmg (1 — @armg)(1 — batmg) 


1~Kaimg 


change indicator (i.e. the first source of evidence) and the 
image change indicator (i.e. the second source of evidence). 
In Table I, m1(.) and m‘(.) represent the concordance and 
discordance BBAs from AH, whereas in Table II mg(.) and 
m4(.) represent the concordance and discordance BBAs from 
images. 


Here for comparison of the two belief functions, these two 
BBAs are fused with both DS and PCR6 fusion rules. The 
fusion rules for height change indicator and image change 
indicator are explained in Table I and Table I. In Table I, 
the m, and m4, represent the concordance and discordance 
BBAs from AH. In Table II we use mz and m/, to represent 
the concordance and discordance BBAs from images. 


D. BBAs combination for building change detection 


From the previous step of BBAs modelings, each pixel 
will get two sets of BBAs to combine resulting from Table 
I and II. More precisely, we will have to combine either 
{mP5(.),m}5(.)} if DS rule is preferred for the BBA model- 
ing, or {mPCR (.), mf'C*%®(.)} if PCR6 rule is adopted. These 
BBAs have been represented by aj,b1,c, and ag, b2,c2 in 
Table IIL. 


TABLE III. FUSION MODELS FOR BUILDING CHANGE DETECTION. 
Focal Elem. = mi(.) — ma(.) mie) mie FY 
01 ay 0 rice ai (bi + bg) + S152 
92 a 0 T sah azby 
i 0 be SREB? (an t+ as)ba + Bbag 
01 U 62 0 bt ae a3b1 
92 U Os 2 0 roa bp azb3 
Se a3 bs T sb a3b3 


Based on different BBAs and fusion methods, four sets of 
global BBAs can be computed from Table III. 


Gy = DS{mP*(.),mz*(.)}, 

G2 = PCR6{my*(.),mz*(.)}, 

G3 = DS{mz°**(.), mz o"(.)}, 
Ga = PORG{mPC*(.), mF CPE(.)}. 


(8) 


After the fusion step, each pixel in the images will get a 
certain degree of belief for all focal elements. Based on the 
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these BBAs, a final decision can be made. DST and DSmT 
have different approaches to get this final decision. In this 
paper four decision criteria are tested. More precisely, we have 
evaluated the maximum of global BBAs (Max_Bel), maximum 
of plausibility (Max_Pl), maximum of betting probabilities 
(Max_BetP) and the maximum of DSmP (Max_DSmP),), see 
[8] (Vol. 3, Chap. 3) and [10] for the mathematical definitions 
of Bel(.), Pl(.), BetP(.) and DSmP(.) functions. 


IV. EXPERIMENTS 


The two proposed BBAs modelings and fusion methods 
(based on DS and PCR6 rules) have been tested on one real 
dataset. The dataset and the results from each step are detailed 
in this section. 


A. Datasets 


The experimental dataset for this research work are dis- 
played in Fig.1. It consists of two pairs of IKONOS stereo 
imagery captured at February 2006 and May 2011 respectively. 
AS a pre-processing step, all data have been correctly radio- 
metrically and geographically co-registered as described in [6]. 
As shown in Fig. 1, this is a normal building change example. 
Several buildings have been built on flat surface. The generated 
DSMs are displayed in Fig. Ic and d. 


B. Results and evaluation 


As the first step, BBAs from image change and height 
change are extracted and refined based on DS fusion and 
PCR6 fusion rules. The four sets of global BBAs are prepared 
corresponding to Eq. (8). Among them the BBA for the 
focal element 6; (Building change) are shown in Fig. 2. The 
accuracy of these BBAs have been evaluated by area under 
Receiver Operating Characteristic curve (AUC). The AUC has 
been recorded on this figure as the caption of each subfigure. 
An advantage of PCR6 can be proved here. It has to be 
noted that the AUCs obtained here are much higher than 
using only height (AUC = 0.9299) [6] or spectral information 
(AUC=0.8823), and generally better than the fusion result 
described in [6] (AUC=0.9621). 
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: a } 7 = « 
a, ap? ae be 


Panchromatic date1 


DSM date1 DSM date2 Elevation (m) 


Fig. 1. Experimental dataset: a) panchromatic image from datel; b) panchromatic image from date2; c) DSM from datel; (d) DSM from date2. 


(c) AUC=0.9763 (d) AUC=0.9767 


Fig. 2. Four global BBAs sets (a) G1; (b)G2; (c)G3; (G4. 
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Besides the AUC comparison, the building change masks 
extracted from these four global BBAs sets are compared and 
evaluated. Each global BBA set can generate four building 
change mask based on these four decision make criteria. 
These building change masks are evaluated based on Kappa 
statistic (KA) and true detected rate (TR). In this paper 
TR = See tne x 100 (in %). The comparison results of 
TR and KA values are shown in Table IV. From Table IV, 
one sees that, G3 and G4, are more advantageous than G'; and 
G2. However, the highest KA is obtained by G; by taking the 
Max_Pl. However, in this paper, only the reference data for 
building changes are available. For better understanding these 
four global BBAs and decision making criteria, reference data 
of all three focal elements 0;, 62 and 63 are required. 


TABLE IV. CHANGE MASKS EVALUATION FROM FOUR GLOBAL BBAS. 
Gi Go G3 Ga 
TR[%| KA  TR[%] KA  TRI%]| KA  TRI%] KA 
Max_Bel 93.35 0.7729 93.35 0.1729 93.39 0.7725 93.39 0.1724 
Max_Pl 93.23 0.7768 93.23 0.7762 93.23 0.7763 93.25 0.7756 
Max_BetP 93.28 (0.7747. 93.32 0.7762 93.32 0.7745 =: 93.32 :0.7741 
Max_DSmP 93.30 0.7739 93.30 0.7734 ~—- 93.30 0.7737. -—-:93.34._(0.7734 


V. CONCLUSIONS 


Belief functions are good choices for DSM assisted change 
detection. Firstly, once the BBA construction is well done, it 
can be robustly used for other images in other regions effi- 
ciently. Secondly, this fusion approach matches well with the 
characteristics of our research topic. Since height information 
is important for separating high/low level objects. Satellite 
images directly highlight all changes on the land surface. None 
of these two sources of information can easily and directly 
lead to a reliable decision on building changes, which matches 
with the initial idea of belief functions. Generally speaking, 
both DST and DSmT frameworks offer the possibility to 
reach a high accuracy result, and PCR6 looks advantageous 
when a larger conflict exists between the different sources 
of evidence. More experiments are under progress to provide 
a finer quantitative comparative analysis in a forthcoming 
publication. 
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Abstract—Digital Surface Models (DSMs) generated from 
satellite stereo imagery provide valuable but not comprehensive 
information for building change detection. Therefore, belief func- 
tions have been introduced to solve this problem by fusing DSM 
information with changes extracted from images. However, miss- 
detection can not be avoided if the DSMs are containing large 
region of wrong height values. A refined workflow is thereby 
proposed by adopting the initial disparity map to generate a 
reliability map. This reliability map is then built in the fusion 
model. The reliability map has been tested in both Dempster- 
Shafer Theory (DST), and Dezert-Smarandache Theory (DSmT) 
frameworks. The results have been validated by comparing to 
the manually extracted change reference mask. 


Keywords: belief functions, DSmT, satellite imaging, building 
change detection. 


I. INTRODUCTION 


In our previous research [1] [2], belief functions have 
performed very well for 3D building change detection. As we 
have mentioned, the accuracy of 2D change detection is limited 
due to the misdetections caused by irrelevant changes. These 
irrelevant changes have a larger effect on very high resolution 
(VHR) images since many details of building changes are 
expected. The DSMs generated from satellite stereo imagery 
can largely help to solve this problem. However, the DSMs 
may still exhibit some outliers resulting in occlusions within 
the stereo/multi views and due to matching mistakes. In this 
case, change information from spectral information of the 
original stereo imagery can and should be used together with 
height changes to eventually highlight building changes. For 
this purpose proper fusion theories and approaches are needed. 


In paper [2], the belief functions introduced in the 
Dempster-Shafer Theory (DST) [3] [4], and extended in 
Dezert-Smarandache Theory (DSmT) [5] are used to deal 
with the uncertainty information delivered from the DSMs. 
In [2] the possibility of using Dempter’s fusion rule and the 
Proportional Conflict Redistribution Rule #6 (PCR6) of DSmT 
in our application have been tested. Though improvements 
have been proven by comparing to the method stated in [1], 
false alarms can not be avoided in case of large regions of 
wrong height change values. Thereupon, in this paper the 
reliability map is adopted as an additional source of evidence 
to correct the basic Belief Assignments (BBAs) and thus refine 
the fusion model. 


This paper is organized as follow. Firstly, the belief 
functions and building change detection fusion models are 
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briefly reviewed. Then, the reliability discounting techniques 
are presented and the reliability map is generated. Later, the 
final four global BBAs are described together with the four 
decision criteria with which the final change detection mask 
can be generated. In the end, these refined fusion models are 
tested on two sets of satellite real data. 


II. BELIEF FUNCTION BASED BUILDING CHANGE 
DETECTION 


A. Basics of belief functions 


Detailed presentations of DST and DSmT can be found 
in [5], [6] and [3]. Let © be a frame of discernment of a 
problem under consideration. 0 = {61,62,...,9n} consists 
of a list of N exhaustive and mutually exclusive elements 6;, 
i=1,2,...,N. Each 6; represents a possible state related to 
the problem we want to solve. The assumption of exhaustivity 
and mutual exclusivity of elements of © is classically referred 
as Shafer’s model of the frame ©. A BBA also called a belief 
mass function (or just a mass for short), is a mapping m(.) : 
2° —+ [0,1] from the power set! of © denoted 2° to [0,1], 
that verifies [3]: 
m(0)=0, and S~ m(X)=1. (1) 


X€E2° 


m(X) represents the mass of belief exactly committed to X. 
An element X € 2° is called a focal element if and only if 
m(X) > 0. In DST, the combination (fusion) of several inde- 
pendent sources of evidences is done with Dempster-Shafer? 
(DS) rule of combination, assuming that the sources are not 
in total conflict?. DS combination of two independent BBAs 
my,(.) and mg(.), denoted symbolically by DS(m1, mz), is 
defined by m?°(0) = 0, and for all X € 2© \ {0} by: 


1 
mi" (X) = — oe a m4(X1)mz(X2), (2) 


X1,X2€2° 
X |ANo=X 


'The power set is the set of all subsets of ©, empty set included. 

Although the rule has been proposed originally by Dempster, we call it 
Dempster-Shafer rule because it has been widely promoted by Shafer in DST. 

3otherwise DS rule is mathematically not defined because of 0/0 indeter- 
minacy. 
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where the total degree of conflict K?* is given by 


KDSA sc 


KiEXSe2e 
X1NX2e=0 


m4 (X1)m2(X2). (3) 


A discussion on the validity of DS rule and its incompat- 
ibility with Bayes fusion rule for combining Bayesian BBAs 
can be found in [6], [7], [8]. To circumvent the problems of DS 
rule, Smarandache and Dezert (see [5], Vol. 2, Chap. 1), then 
Martin and Osswald (see [5], Vol. 2, Chap. 2) have developed 
in DSmT [5] two fusion rules called PCR5 and PCR6 based on 
the proportional conflict redistribution (PCR) principle which 
consists 


1) _ to apply the conjunctive rule; 

2) calculate the total or partial conflicting masses; 

3) then redistribute the (total or partial) conflicting mass 
proportionally on non-empty sets according to the 
integrity constraints one has for the frame O. 


This PCR principle transfers the conflicting mass only to 
the elements involved in the conflict and proportionally to their 
individual masses, so that the specificity of the information 
is not degraded. Because the proportional transfer can be 
done in two different ways, this has yielded to two different 
fusion rules. It has been proved in [9] that only PCR6 rule 
is compatible with frequentest probability estimation, and that 
is why we recommend its use in the applications. PCR5 and 
PCR6 rules simplify greatly and coincide for the combination 
of two sources. In this case, the PCR6 combination is obtained 
by taking m?°*6(9) = 0, and for all X 4 0 in 2° by 


mPCRS(X) = S- my(X1)m2(X2)+ 
my(X)?mo(¥) m2(X)?mi(Y) 
| | 1} @) 
oe mi(X) +m2(¥) — m(X) +m (¥) 
XNY=0 


where all denominators in Eq. (4) are different from zero. 
If a denominator is zero, that fraction is discarded. If a 
denominator, e.g., m1(X)+m2(Y) tends towards 0, then also 
the conflicting mass m (X)mo2(Y) that is transferable tends 
to zero because m,(X) and m2(Y) tend to zero (since they 
are positive), therefore the redistribution masses also tend to 
zero. That reflects the continuity of PCR6. 


B. BBAs for Building change detection 


1) Choice of the frame of discernment: Focusing on build- 
ing change detection, two change indicators, one from images 
and one from DSMs are used. Changes from spectral images 
are highlighted by using the Iteratively Reweighted Multivari- 
ate Alteration Detection IRMAD) [10]. Consequently height 
changes from DSMs are shown after robust height difference 
[1]. Three classes are considered to define the frame of 
discernment satisfying Shafer’s model: 


© = {6, = Pixel € BuildingChange, 
65 = Pixel € OtherChange, (5) 
63 = Pixel € NoChange}, 
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and 
6,162.03 = @. (6) 


Based on the three classes, the set of focal elements F'E 
that are of interest in our application is: 


FE = {61,02, 03,01 U 02, 02 U 03,0; U82UA3}. (7) 


2) BBAs construction: Paper [2] constructed the sigmoidal 
model for both concordance and discordance indexes. The 
details and advantages of this approach are described in [11]. 
The concordance index measures the concordace of change 
indicator and BBA in the assertion, while the discordance 
measures the opposition of change indicator to the BBAs in 
the assertion. The original sigmoid curve is defined as 


a 


fir.r)(x) = 0.99/(1 +e" >"), (8) 


where wz is the original value of each indicator. Two parameters 
T and 7 are used to control the symmetry point and the slope of 
the sigmoid function. The symmetry point indicates a certainty 
of 50%. In [11] these two parameters T’ and 7 are manually 
given. Here, the multi-level Otsu’s thresholding method [12] 
is used for automatically getting the symmetry points for both 
concordance index and discordance index. Otsu’s algorithm 
defines that an image is composed of objects and background. 
A discriminant analysis is performed by minimizing the intra- 
class variance. When three classes are of interest, two threshold 
values are expected. Otsu’s method can be extended to 


o2.(T,, T2) = w107(T1, To) 
+ w.03(T1, T2) + w303(T1,T2). (9) 


The weights w; are the probabilities obtained from the image 
histogram that are separated by the thresholds T\ and T>. o; 
is the standard deviation of the 2-th class, for 1 = 1,2,3. T 
and T> can be used as the symmetry points of discordance 
and concordance index respectively. Thus, using height change 
index as example, the BBAs for discordance and concordance 
height change index are functions of values aay and bay 
defined by 


aAHq = fr.T, (AB), and baH = f-+,T. (AH). (10) 


The factor 7 is calculated with a sample value (AH = 1, 
aaH = 0.1), which means 1 meter height change indicates 
10% probability to be building changes. The BBAs for discor- 
dance and concordance image change index are built similarly. 
Differences appearing in 2D images give a concordance indi- 
cation for all changes, which include the building changes and 
other changes (6; U62). In this paper the changes from images 
are named Almg. 


In the Tables I and II, we present the two ways of 
construction of the BBAs from the sources of evidence based 
either on DS or on PCR6 rules of combination for the height 
change indicator (i.e. the first source of evidence) and the 
image change indicator (i.e. the second source of evidence). 
In Table I, m1(.) and m‘(.) represent the concordance and 
discordance BBAs from AH, whereas in Table II m2(.) and 
m(.) represent the concordance and discordance BBAs from 
images. 
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TABLE I. BBA CONSTRUCTION FOR HEIGHT CHANGE INDICATOR AF. [Kaw = aanbar] 
Focal Elem. mi(.) m4(.) mP() mProRey 
a (1-6 y a K 
0, aan eA | wan Stan) eee | 
02 0 0 0 0 
03 0 0 0 0 
6, UA, 0 0 0 0 | 
(i—a yo B K 
02 U 03 0 ban Ta (1 Gan )ban + Gee ee | 
1l-a 1—b 
6; U 02 U 03 l—aay 1— ban ee ered (1 — aan) (1 — ban) | 
TABLE II. BBA CONSTRUCTION FOR IMAGE CHANGE INDICATOR Almg. [KAImg = @AImgbaImg] 
Focal Elem. ma(.) mo(.) my (.) aided ©) 
01 0 0 0 0 
05 0 0 0 0 
G-aartmg)’AIm bAImgKAIm 
03 0 bAImg T=KA Tig £ (1 GAImg)baImg T aRTrig TERT a 
4ATmgA—batImg) @AImgKAIm 
6, U 82 @Atmg 0 = A Ratgig = @a1mg(1 — barmg) + Atma teAToag 
62 U 43 0 0 0 0 
M=4atmg)A-bartmg) 
0, U 02 U 63 1—datmg 1— bartmg i Kare (1 — @armg)(1 — barmg) 
II. RELIABILITY DISCOUNTING 


The reliability discounting has been described and dis- 
cussed in the references [13] and [14]. Briefly said, if an 
additional knowledge about the reliability (a) of certain in- 
dicator (X) is available, it can be adopted to refine the initial 
BBAs. a would be a value ranging from 0 to 1. And a = 1 
means fully reliable, while a = 0 means the indicator is 
totally unreliable. Based on Shafer’s discounting model [3], 
the reliability discounting factor a is introduced to discount 
any BBA m/(.) defined on the power set 2° as follows: 


m(X),for X 4 O, (1) 
-m(Q) + (1— a). 
In the DSM assisted building change detection, false alarms 
are detected if wrong heights are present in DSM for large 
regions [1]. And these wrong heights are mostly introduced 
not in the stereoscope images matching procedure, but in 
the gaps filling step. In our DSM generation procedure, the 
height of un-matched pixels are interpolated using the height 
values of neighborhood pixels. Therefore, a reliable height 
value can be achieved for small gaps. When large gaps turn up 
in the disparity map, for example, a whole building roof, the 
height of that building can not be correctly interpolated. Thus, 
the percentage of available correctly matched neighborhood 
pixels inside a predefined region can be used to generate the 
height reliability. Fig. 1 shows an example of the generated 
reliability map. Fig. la is the gaps mask. The gaps region of 
the disparity map is represented with black color. Pixels with 
proper elevation values are displayed with white color. It can 
be observed, based on our approach that pixels in the center 
of a gap get lower reliability factor values than pixels next to 
the gap boundary (see Fig.1b). 


In the building change detection procedure, the reliability 
map of two DSMs (apsmi and apsme ) are calculated 
respectively. They are then fused together to generate a final 
reliability map a,y for the height change indicator. 


QAH = ApsM1°OpDsm2- (12) 


(b) 


Fig. 1. Reliability map (b) generated from the gaps mask (a). 


IV. GLOBAL BBAS AND CHANGE DETECTION 
A. Global BBAs generation 


The BBAs related with the concordance and discordance 
indexes are combined to get the global BBA regarding to each 
source of evidence. These global BBAs will then be used 
as input for solving the change detection problem thanks to 
their combination. From the previous step of BBAs modelings, 
each pixel will get two sets of BBAs to combine results from 
Table I and II. More precisely, we will have to combine either 
{mP%(.),m?5(.)} if DS rule is preferred for the BBA mod- 
eling, or {mP CRE) mésCR6(.)\ if the PCR6 rule is adopted. 
These BBAs from Table I and II have been represented by 
a1,6,,¢, and ag,b2,c2. In this paper, the mass values ay, 
by, and c; are further discounted by the generated reliability 
map a, y and denoted respectively as A,, B,, and C;. More 
precisely, one computes 


A, = QaqH ° 41, 
By = aan: bi, (13) 


Ci =aan- c+ (1 _ QAH). 


In this application, only the reliability map for height 
change indicators is generated. The reliability map for image 
change indicators can also be constructed according to the 
change objects of interested. For instance, vegetation mask can 
be used to discount the reliability of building changes. How- 
ever, this paper focuses on the reliability of height information. 
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When the reliability map of image changes is available, it could 
be used as the same way as height change reliability map. Table 
Ill and Table IV describe the final building change detection 
models based either on DS or on PCR6 rules. Here, the 
discounted height change indicators is denoted as ma, ;,(.). 


TABLE II. DS FUSION MODEL FOR BUILDING CHANGE DETECTION. 


Focal Elem. May yh) m2(.) 

Oy Aj 0 

02 0 0 

03 0 bo 

01, U 82 0 bi 
02 U 63 Ag 0 = 
(S) As bg 


TABLE IV. PCR6 FUSION MODEL FOR BUILDING CHANGE 


DETECTION. 


Focal Elem. Miay yO) m2(.) mig Fe) aoe 
A 

04 Ai 0 Ai(bs + b3) + Se 
02 0 0 Agbi 

03 0 be (Az + a3)b2 + ree 
0, UO. 0 by A3b1 
02 U 63 Ag 0 A2b3 
[s) A3 b3 A3b3 


Miaay(.) can be obtained from the discounting of the 
fusion results presented in Table I. Thus they have been 
denoted respectively as Lem and meaner This dis- 
counted height change indicators are fused in the second step 
with image change indicator m(.) to generate the final global 
BBAs. From the tables III and IV, four sets of global BBAs 
can be computed based on different BBAs and fusion methods 


as follows: 


Gy = DS{mqen (m7 > ()}, 


laanH 


Go = PCR6{mP5 __(.),mP5(.)}, 


laan 


Ga= DSmigss (sig Oh 


laan 


G4 = PCR6{mPOF6(.), mFPCRE(.)}. 


laanH 


(14) 


For example, if both the BBA modeling procedure and 
global BBAs are constructed based on DS fusion rule, the 
generated global BBA is recorded as G}. 


B. Change mask generation 


After the fusion step, each pixel in the images will get a 
certain degree of belief for all focal elements. The value of 
global BBAs in 6; gives a direct building change probability 
map. A change mask can be generated after giving a threshold 
value. However, BBAs on the partial ignorance and full ig- 
norance set should also be considered in the decision making 
procedure. DST and DSmT propose different approaches to 
take the final decision. In this work, the same decision criteria 
as used in [2] are tested. They are: 1) maximum of global 
BBAs (Max_Bel), 2) maximum of plausibility (Max_PI), 3) 
maximum of betting probabilities (Max_BetP) and 4) the 
maximum of DSmP (Max_DSmP). The reader can refer to 
[3] and [5] (Vol. 3, Chap. 3) for the mathematical definitions 
of Bel(.), Pl(.), BetP(.) and DSmP(.) functions. 


130 


V. EXPERIMENTS 


The improved building change detection fusion models 
have been tested on satellite images. The datasets and the 
experiments are described in this section. 


A. Datasets 


The experimental datasets consist of two pairs of IKONOS 
stereo imagery captured in February 2006 and May 2011 
respectively shown in Fig. 2 and 3. The first two images in each 
figure are the panchromatic images of two dates. (c) and (d) 
are the generated DSMs. They have been generated based on 
the method explained in [15]. The colors represent the height 
range in this test region. 


(d) 


95 Elevation (m) 


Fig. 2. Experimental dataset: a) panchromatic image from datel; b) 
panchromatic image from date2; c) DSM from datel; (d) DSM from date2. 


(c) (d) 
| 


65 70 75 80 85 90 Elevation(m) 


Fig. 3. Datasets of the 2nd test region; a) panchromatic image from date1l; 
b) panchromatic image from date2; c) DSM from date1l; (d) DSM from date2. 
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The spatial co-registration is achieved though camera 
model parameter corrections before the DSM generation pro- 
cedure [15]. The radiometrical co-registration method has been 
described in [1]. Fig. 2 shows a normal building change 
example. Several buildings have been built on flat surface. The 
generated DSMs are displayed in Fig. 2c and d. In the second 
example (shown in Fig. 3), a large percentage of pixels on the 
roof of the large building in the center appear as gaps in the 
disparity map. In the filling procedure, the large size of the 
gap in the datel data lead to the missing of this building in 
the DSM (Fig. 3c). 


B. Results and evaluation 


The refined DS fusion model and PCR6 fusion model 
have been applied to both datasets respectively. To show the 
improvement obtained by our method, we have compared 
its results with the original results we can obtain with the 
method in [2]. Firstly, the global BBAs of 0; are compared 
and displayed in Fig. 4 below. 


= 
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Fig. 4. Global Building change BBAs (a) Initial result; (b) Refined result; 
(c) Ground truth. 


Fig. 4(a) corresponds to the original* result, and Fig. 4(b) 
shows the refined result based on G'1(6,). By comparing to the 
ground truth (Fig. 4(c)), the improvements can be clearly ob- 
served in the building boundary regions, especially the building 
marked with a white circle. In the initial result, the pixels next 
to this building are falsely detected as BuildingChange. 


To evaluate quantitatively the performances of the different 
fusion approaches, the extracted BBAs from both approaches 
(original and refined) are compared to the manually extracted 
change reference masks. The results are analyzed in terms of 
Receiver Operating Characteristic (ROC) curve [16]. A larger 
area under the ROC curve (AUC) indicates a better accuracy 
of the building change map. The numerical evaluation results 
are described in Table V. The obtained AUC values prove a 
general improvement after reliability discounting is applied. 


In addition to the AUC comparison, the building change 
masks extracted from these four global BBAs sets are com- 
pared and evaluated. Each global BBA set can generate four 
building change masks based on these four decision criteria. 


obtained without reliability discounting, as presented in [2]. 
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TABLE V. QUALITY COMPARISON OF GLOBAL BBA (BUILDING 
CHANGE). 
Test Region 1 Test Region 2 
Original | Refined | Original ] Refined 
G, |] 0.9811 || 0.9833 || 0.9509 |} 0.9950 
Go |} 0.9829 || 0.9839 || 0.9485 |] 0.9931 
Gs |] 0.9815 || 0.9837 || 0.9512 |] 0.9955 
G4 || 0.9835 || 0.9844 || 0.9487 || 0.9939 


These building change masks are compared with the masks 
from paper [2] based on Kappa statistic (KA). The comparison 
results of Test region | are shown in Table VI. Limited by the 
reference data we can get, only the building change frame 
is evaluated here. One sees the reliability discounting map 
helps to improve the result accuracy in all fusion and decision 
approaches. 


In the second test region, there is actually no building 
changes. The purpose of showing this test region is to further 
prove the advantage of the extracted reliability map. Fig. 5 
shows the extracted reliability discounting map of the height 
changes. The windowsize we selected for this test region is 
9 x 9. By using this reliability map, final fusion result of 
G,(@1) is achieved and shown in Fig. 6(a). As a comparison, 
the G'1(61) of the initial fusion model is displayed in Fig. 6(b). 
This is the same building that we have discussed in paper 
[1]. It can be noted in Fig. 3, this building exists in both 
panchromatic images of two dates. However, only the DSM 
from datel contains the correct height of this building. In Fig. 
3c, this building can not be recognized. Therefore, a very high 
BBA would be achieved in the height change indicator. A high 
value in m (.) leads to a high global BBAs in building changes 
(as shown in Fig. 6(a)). Thus this building would be falsely 
detected as building changes. However, after discounting this 
region has much lower global BBAs (see Fig. 6(b)), and can 
be further correctly detected as NoChange. 


1 
om 


Generated height change reliability map of the test region 2. 


(a) 


©) 
Global Building change BBAs (a) Initial result; (b) Refined result. 


Fig. 5. 


0 1 


Fig. 6. 
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TABLE VI. CHANGE MASKS EVALUATION FROM FOUR GLOBAL BBAS. 
Gi G2 G3 G4 
Original | Refined | Original | Refined | Original | Refined | Original | Refined | 
Max_Bel 0.9271 |} 0.9324 || 0.9271 |} 0.9324 |) 0.9266 || 0.9322 |} 0.9265 || 0.9321 
Max_Pl 0.9291 |} 0.9342 || 0.9288 || 0.9339 || 0.9287 || 0.9339 || 0.9284 || 0.9336 
Max_BetP |] 0.9283 || 0.9335 |} 0.9282 || 0.9334 || 0.9279 || 0.9333 || 0.9278 || 0.9333 
Max_DSmP |} 0.9281 |} 0.9333 || 0.9280 || 0.9331 || 0.9278 || 0.9331 |} 0.9276 || 0.9330 
VI. CONCLUSIONS [7] J. Dezert, P. Wang, and A. Tchamova, “On the validity of Dempster- 
Shafer theory,” in Proc. of FUSION 2012, 2012, pp. 655-660. [Online]. 
Building change detection is a difficult topic, especially Available: http://fs.gallup.unm.edu//DSmT.htm 
when the building changes happen together with other ir- [8] A. Tchamova and J. Dezert, “On the behavior of Dempster’s rule of 
relevant changes. Our previous research has evidenced the combination and the foundations of Dempster-Shafer theory,” in Proc. 
. ae : of IS 2012, 2012, pp. 108-113 
performance of the belief functions in DSM assisted change , ; 
detection [2] In this paper the change detection accuracy is [9] EF Smarandache and J. Dezert, “On the consistency of PCR6 with the 
; : : on a een averaging rule and its application to probability estimation,” in Proc. of 
further improved by adopting | an additional reliability map. FUSION 2013, 2013, pp. 1119-1126. 
Height has proved to be an important feature for building [10] A. A. Nielsen, “the regularized iteratively reweighted mad method for 
change detection. However, the DSMs from satellite images change detection in multi-and hyperspectral data”;’ JEEE Trans. Image 
do not always provide reliable height information, due to the Process., vol. 16, no. 2, pp. 463-478, 2007. 
occlusion and matching errors. The wrong height information [11] J. Dezert and J.-M. Tacnet, “Sigmoidal model for belief function-based 
will thus bring false alarms to the change detection procedure. electre tri method,” in Belief Functions: Theory and Applications, 2012, 
Therefore, the original unfilled disparity maps are adopted to i Posie ; 
generate an height change reliability map, which is further used [12] N. Otsu, “A threshold selection method from gray-level histograms, 
i IEEE Trans. Syst., Man, Cybern., vol. 9, no. 1, pp. 62-66, 1975. 
in the fusion models. 7 : , ; : 
[13] D. Mercier, B. Quost, and T. Denceux, “Contextual discounting of belief 
Our first experimental results have shown that this relia- functions,” in Symbolic and Quantitative Approaches to Reasoning with 
246 P . Uncertainty. Springer, 2005, pp. 552-562. 
bility map can improve the quality of all four global BBAs, . hes 
a further Gad died Caecc ole HIGH FeCGliR ara. [14] EF. Smarandache, J. Dezert, and J.-M. Tacnet, “Fusion of sources 
ang ue ler a Mens § : of evidence with different importances and reliabilities,” in Proc. of 
four decision criteria. However, the two test regions were FUSION2010. YEEE, 2010, pp. 1-8. 
quite small to draw a definitive conclusion that is why more [15] P. dAngelo and P. Reinartz, “DSM based orientation of large stereo 
experiments will be performed on a wider variety of regions satellite image blocks,” Int. Arch. Photogramm. Remote Sens. Spatial 
with different types of backgrounds. A detailed statistical Inf. Sci, vol. 39, no. B1, pp. 209-214, 2012. 
analysis and comparisons of the results with other techniques [16] M. H. Zweig and G. Campbell, “Receiver-operating characteristic 


is under progress and they will be presented in a forthcoming 
publication. 


Generally speaking, both DST and DSmT frameworks offer 
the possibility to reach a high accuracy result. The workflow 
proposed in this paper enables an automatic building change 
detection procedure. Other reliability maps from images would 
be further adopted in future work. Furthermore, besides build- 
ing changes, more change objects will be considered in the 
fusion model. 
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Abstract—This paper proposes a new generic object recog- 
nition (GOR) method based on the multiple feature fusion of 
2D and 3D SIFT (scale invariant feature transform) descriptors 
drawn from 2D images and 3D point clouds. We also use 
trained Support Vector Machine (SVM) classifiers to recognize 
the objects from the result of the multiple feature fusion. We 
analyze and evaluate different strategies for making this multiple 
feature fusion applied to real open-datasets. Our results show 
that this new GOR method has higher recognition rates than 
classical methods, even if one has large intra-class variations, 
or high inter-class similarities of the objects to recognize, which 
demonstrates the potential interest of this new approach. 
Keywords: generic object recognition, point cloud, 2D SIFT, 


3D SIFT, Feature fusion, BoW, SVM, belief functions, PCR. 


I. INTRODUCTION 


Generic object recognition (GOR) in real environment plays 
a significant role in computer vision and artificial intelli- 
gence. It has important applications in intelligent monitoring, 
robotics, medical image processing, etc [1]—[3]. Contrariwise 
to specific object recognition', GOR is much more difficult 
to accomplish. Mainly because the generic features of objects 
which express the common properties in the same class and 
help to make the difference between classes need to be found 
out, instead of defining characteristics of particular category as 
used in specific object recognition (SOR) methods. The current 
main techniques for GOR are based on local feature extraction 
algorithms on 2D images, typically the 2D SIFT (scale invari- 
ant feature transform) descriptors [4], [5]. However, 2D images 
lose the 3D information of the objects, and are susceptible 
to change due to various external illumination conditions. To 
solve this drawback, 3D SIFT descriptors based on volumes 
[3], [6]-[10], and 3D descriptors based on point cloud model 
[11]-[13] have been proposed recently by several researchers 
because point cloud model of object is obtained from the depth 
images which only depends on the geometry of the objects. 
Such point cloud model has nothing to do with the brightness 
and reflection features of the objects. That is the main reason 
why we are also interested by these technique in this paper. 
3D SIFT descriptors have been applied successfully in motion 


‘such as face recognition [1] (SOR) where only certain objects or certain 
categories need to be recognized, which can be accomplished by training mass 
samples. 
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recognition of consecutive video frames by Scovanner et al. 
[6]. They show good performance in medical image processing 
[3], [7]-[9] as well. Object recognition has also be done with 
3D SIFT in complex Computed Tomography (CT) for airport 
baggage inspection and security by Flitton et al. [10]. 

The object recognition algorithms based on single feature 
only often generate erroneous object recognitions, specially 
if there are big intra-class variations and some inter-class 
high similarities, or if there exist important changes in pose 
and appearance of objects. In these conditions, the use of a 
single feature is insufficient to make a reliable recognition 
and classification. To overcome this serious drawback, new 
recognition algorithms based on multiple features and fusion 
algorithms have been proposed recently in the literature [14]- 
[17]. Compared with the recognition algorithm using single 
feature only, the feature fusion algorithms combine multi- 
ple features information which can improve substantially the 
recognition rate. 

In this paper, we propose a new method for GOR based on 
feature fusion of 2D and 3D SIFT descriptors, which consists 
of two main phases: 1) a training phase, and 2) a testing phase. 
In the both phases, we consider two types of inputs: 

1) The first type of input is a database with 3D point cloud 
model representation of different objects from different 
categories (classes). In this work, our database has been 
just obtained from the web?. It is characterized by 3D 
SIFT descriptors adapted (in this paper) for point cloud 
— see the next section for details. 

2) As second input, we use the same database with 2D 
images including some objects that are characterized by 
their 2D SIFT descriptors. 

From these two inputs, the 2D and 3D SIFT feature 
descriptors are transformed into the corresponding Bag of 
Words (BoW) feature vector [18]. In the training phases, 
these two BoW feature vectors (drawn from the 2D and 3D 
SIFT) describing the object are used to train Support Vector 
Machines (SVMs) [19] to get the prediction functions. After 
this training phase, the system is used to recognize unknown 
objects in the testing phase. These two BoW feature vectors 


*http://rgbd-dataset.cs. washington.edu/dataset.html 
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describing the object are used to make the object recognition 
in the testing phase. In this paper, we test: 

1) the feature-level fusion strategy, where we combine 
(fuse) directly the two BoW-based feature vectors and 
we feed the trained SVM with the fused vector to get 
the final recognition result. 

2) the decision-level fusion strategy, where each of the 
two BoW-based feature vectors feeds its corresponding 
trained SVM to get the corresponding recognition re- 
sult separately. Then we test different fusion rules to 
combine these two recognition results to get the final 
recognition result. 

The paper is organized as follows. The recognition algo- 
rithm is described in details in section II. Section II evaluates 
the performances of this new method on real datasets. Con- 
clusions with perspectives are given in section IV. 


II. NEW GENERIC OBJECT RECOGNITION METHOD 


This new method of object recognition consists in three 
main steps (features extraction and representation, features 
fusion, and classifier design) that we present in details in 
this section. To achieve the good recognition of objects, 
we propose to combine 2D scale-invariant feature transform 
(2D SIFT) characterizing the object features, with 3D SIFT 
(based on point clouds model). We need at first to recall the 
principle of 2D SIFT [4], [5], and we explain improved 3D 
SIFT descriptors applied in point cloud. 


Step 1: Features extraction and representation 


Feature extraction and representation are necessary for any 
object recognition algorithm. In many situations the object 
recognition task is very difficult because it is possible that 
some (partial) similarities exist in different classes of objects, 
as well as (partial) dissimilarities in the same class of objects. 
So the feature extraction process must be done as efficient as 
possible in order to help the recognition of objects by making 
the difference between object classes biggest, and by making 
the difference in the same class smallest. The objects need also 
to be represented at a certain level of semantic, using limited 
training objects to represent the class [2]. 


— 2D SIFT descriptor 


In 1999, David Lowe [4] did present for the first time 
a new method to extract keypoints of objects in images, 
and to describe their local features that allows to make 
generic object recognition, for example in computer vision 
applications. His method has then been improved in [5], and 
extended to 3D by other authors (see next paragraph). The 
feature description of the object drawn from a training image 
is then used to identify the presence (if any) of the object 
in real (usually cluttered) observed scene. To get good object 
recognition performances, Lowe proposed a (2D) SIFT (scale- 
invariant feature transform) that warranties that the features 
extracted (i.e. the key-points) from the training image are 
detectable under changes in image orientation, scale, noise 


and illumination, and even if partial object occlusions occur 
in the observed scene. Lowe’s SIFT feature descriptor is 
invariant to uniform scaling, orientation, and partially invariant 
to illumination changes and robust to local geometric (affine) 
distortion. The stable key-points locations of SIFT are given 
by the detection of scale-space extrema in the Difference-of- 
Gaussian (DoG) function D(x, y, 7) convolved with the image 
I(x, y). More precisely, one defines [5] 


D(a,y,0) = L(x, y, ko) — L(a,y,0), (1) 


where L(a,y,ko) = G(a,y,ko) * I(a,y) and L(a,y,0) = 
G(a,y,o) * I(x,y) are Gaussian-blurred images at nearby 
scale-space o separated by a constant multiplicative factor? 
k, and where * is the convolution operator and G(x, y,c) is 
the centered Gaussian kernel defined by 


G(z,y,0) 4 ety) 2c. (2) 


— Qna? 

The local extreme points of D(xz,y,c) functions (DoG 
images) define the set of keypoint candidates (the SIFT 
descriptor). To detect the keypoints, each sample point (pixel) 
is compared to its eight neighbors in the current image and 
its nine neighbors in the scale below and above. The sample 
point under test is considered as a keypoint (local extrema) if 
its value is larger (or smaller) than all of its 26 neighbors. The 
localization of a candidate keypoint is done by the 2nd-order 
Taylor expansion of the DoG scale-space function D(x, y,c) 
with the candidate keypoint taken as the origin [5]. However 
in general there are too many candidate keypoints and we need 
to identify and remove the bad candidates that have too low 
contrast*, or are poorly localized along an edge. For doing this, 
a contrast thresholding is applied on D(a, y,c) to eliminate 
all the candidate keypoints below a chosen? threshold value r. 
To eliminate the candidate keypoints that are poorly localized 
along an edge, Lowe [5] uses a thresholding method based on 
the ratio of the eigenvalues of the Hessian matrix H of the 
DoG function, because for poorly defined extrema in the DoG 
function the principal curvature across the edge would be much 
larger than the principal curvature along it. More precisely, if 
the ratio Tr(H)?/Det(H) > (rin +1)?/ren then the candidate 
keypoint is rejected. Here, 74), is a chosen threshold value of 
the ratio between the largest magnitude eigenvalue of H and 
the smaller one®. 

Once all the keypoints are determined, one must assign 
a consistent orientation based on local image properties, 
from which the keypoint descriptor can be represented, hence 
achieving invariance to image rotation. For this, the scale of 
the keypoint is used to choose the Gaussian-blurred image 
LI, with the closest scale. The keypoint descriptor is created 
by computing at first the gradient magnitude m(z, y) and its 


3The choice for k = 2!/® is justified by Lowe in [4], where s is an integer 
number of intervals 

+because they are sensitive to noise. 

5We have chosen + = 0.02 in our simulations. 

®In [5], Lowe takes rz, = 10. 
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orientation 0(x,y) at each pixel (%,y) in the region around 
the keypoint in this Gaussian-blurred image L as follows [5] 


ea = 2+ 12, és 


O(a, y) = tan-1(F4), 


with L, = L(x +1,y) — L(a@ —1,y) and Ly = L(x,y + 
1) — L(x,y — 1). In [5], a set of orientation histograms is 
created on 4x4 pixel neighborhoods with 8 directions (bins) 
each. These histograms are computed from magnitude and 
orientation values of samples in a 16 x 16 region around 
the keypoint such that each histogram contains samples from 
a 4 x 4 subregion of the original neighborhood region. The 
magnitudes are weighted by a Gaussian function with o equal 
to one half the width of the descriptor window. The descriptor 
then becomes a 128-dimensional feature vector because there 
are 4x4 = 16 histograms each with 8 directions. This vector is 
then normalized to unit length in order to enhance invariance 
to affine changes in illumination. Also a threshold of 0.2 is 
applied to reduce the effects of non-linear illumination, and the 
vector is again normalized. The figure 1 shows an example of 
4 x 4 keypoint descriptor, where the space delimited by the 
purple ellipse is the neighborhood under consideration. 


DODD ROAD OOEE OOON 


ODED OOO owoON oon 
OSO0 OSNE SERE OSES 


SWRO DOG Conn oosy, 
100 BOG DOSS wee, 
i GSMS SNe oF 


4x4 Keypoint descriptor 
(128 features) 


Image gradient 


Fig. 1: A 4 x 4 Keypoint descriptor (Credit: J. Hurrelmann). 


The simplest method to find the best candidate match 
for each keypoint would consist in identifying its nearest’ 
neighbor in the database of key points from training images. 
Unfortunately, SIFT-based keypoint matching requires more 
sophisticate methods because many features from an image 
will not have any correct match in the training database 
because of background clutter in observed scene and because 
of possible missing features in training images, see [5] for 
details. SIFT method is patented by the University of Bristish 
Columbia (US Patent 6,711,293 — March 23, 2004) and a demo 
is available in [20]. Open SIFT codes can be found on the web, 
for example in [21]. 


— 3D SIFT descriptor 


The previous 2D SIFT descriptor working with pixels has 
been extended to 3D using volumes in different manners by 
different authors [3], [6]—[10]. In this paper, we adapt the 
3D SIFT for point cloud inspired by [6], [13]. But all the 
methods require same functional steps as for 2D SIFT, that 


7based on Euclidean distance metric. 


is 1) Keypoints detection; 2) Key points orientation; and 3) 
Descriptor representation. We present these steps in detail in 
the next subsections. 


1) Keypoint detection 


The scale space of a 3D input point cloud is defined as a 4D 
function L(x,y,z,0) = G(x,y,z,ko) * P(x,y,z) obtained 
by the convolution of a 3D variable-scale centered Gaussian 
kernel G(x, y, 2,0), with the input point P(z,y,z), where 


oe 
( Ino)” 


Extending Lowe’s approach [5], scale-space o is separated by 
a constant multiplicative factor k, and the candidate keypoints 
in 4D scale space are taken as the local extrema (maxima or 
minima) of the multi-scale DoG defined for i € [0, s + 2] by 


G(2,y,2,0) = ee ae an) 


To find extrema of the multi-scale DoG function, each 
sample point is compared to its 27 + 26 + 27 = 80 neighbors, 
where 26 neighbors belong to the current scale, and each 
27 neighbors in the scale above and below. A keypoint is 
chosen only if it is larger than all of its neighbors or smaller 
than all of them. To eliminate the bad candidate keypoints 
having low contrast, one uses a thresholding method to re- 
move the erroneous points. A contrast threshold is applied on 
D(x, y, 2, k'c) to eliminate all the candidate keypoints below 
a chosen® threshold value r. 


2) Keypoint orientations 


Similarly to 2D SIFT, once all the keypoints are determined 
in 3D, one must assign a consistent orientation based on local 
points properties, from which the keypoint descriptor can be 
represented, hence achieving invariance to object rotation. For 
this, The two-dimensional histogram is calculated by gathering 
statistics of the angles between the neighboring points and 
their center. The keypoint descriptor is created by computing 
at first the vector magnitude m(x,y,z) and its orientations 
O(x,y,z) (azimuth angle) and (x,y,z) (elevation angle) 
between each point (x, y, z) in the region around the keypoint 
and their center (2, Yc, Z<) as follows? 


m(z,y,2) = V/(e—%e)? + (y— ye)? + (2 — 2c)”, 
O(x,y,2) = tan~* ((y— ye)/(@— xe), (6) 
(x,y,z) =sin™* ((z— 2)/m(a,y, 2). 
In 3D point cloud, each point has two values which represent 
the direction of the region, whereas in 2D case each pixel had 
only one direction of the gradient. 
Extending Lowe’s approach in 3D case, in order to find the 
keypoint orientations we construct a weighted histogram for 


8We have chosen r = 0.5 in our simulations. 

°Tn Eq.(6), 6 and ¢ refer to the original coordinate system. In the paragraph 
“Descriptor representation” on p. 4, they refer to the rotated coordinate system. 
(@c, Yc, Zc) is not same as (Xp, Yp, Zp). The former refers to the center of 


the keypointOs r-points neighborhood. The latter refers to the keypoint. 
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the 3D neighborhood around each candidate keypoint. There 
are different ways for doing this. In this work, a 2D-histogram 
is produced by grouping the angles in bins which divide @ and 
¢ into 10 deg angular bins. A regional Gaussian weighting of 
e7 (24/Rmax)” for the points whose magnitude is d is applied to 
the histogram, where Rmax represents the max distance from 
the center. The sample points at a distance greater than Ryax 
are ignored. The histogram is smoothed using a Gaussian filter 
to limit the effect of noise. The dominant azimuth a and 
elevation 8 of the keypoint are determined by the peaks of 
the 2D-histogram. In order to enhance robustness, peaks in 
the histogram within 80% of the largest peak are also retained 
as possible secondary orientations. 


3) Descriptor representation 


Each keypoint p is described by its location p 4 
[p, Vp, Zp]’, scale oy, and orientation angles a, and 3,. The 
descriptor representation associated with a keypoint p is based 
on the local spatial characteristics around it to describe its 
features. To ensure rotation invariance of the descriptor, the r- 

: ; : A t 
points p; (i = 1,...,1) of coordinates p; = [x;, yi, z;|" around 
the keypoint of interest p are at first transformed (rotated) in 
the dominant orientation of p by the following transformation 


COS Q, COS 8, —sina, —cosa,sin By 
p, = |sina,cos8, cosa, —sina,sin§,| -p;. (7) 
sin Bp 0 cos Bp 


Then the vector n at the key point which is normal to the 
surface of the r-points neighborhood is calculated according 
to the routine available in the open Point Cloud Library (PCL) 
[22]. For each (rotated) point pi (¢ = 1,...,7) in the r-points 
neighborhood of the (rotated) keypoint p’, we calculate the 
vector p’p/, and the magnitude m and angles @ and ¢ according 
to Eq. (6). The angle 5 between n and p’p} is given by 
p’pj in 
Ip’P;| - [nl 

Therefore, a keypoint p’ with its neighbor pj is represented 
by the 4-tuple (m, 0, ¢, 6). To reduce the computational time, 
instead of dividing the neighborhood into n x n x n subregions 
(with n = 4 as in Lowe’s 2D SIFT descriptor), we take directly 
the entire neighborhood, which means that we have n = 1. The 
histogram used to generate the 3D descriptor at the keypoint 
p! is derived by splitting (0,¢,5) space into 45 deg bins, and 
adding up the number of points with the Gaussian weighting 
of e~ 2™/Rmax)” So the dimension of our 3D SIFT descriptor 
isnxnxnx4x4x 8 = 128 (as for the 2D SIFT descriptor 
described previously), because n = 1; the azimuth angle 0 € 
(0, 360] deg which is split into 8 bins of 45 deg; the elevation 
angle @ € [—90, 90] deg which is split into 4 bins of 45 deg; 
and 6 € [0, 180] deg which is also split into 4 bins of 45 deg. 
Each 3D SIFT descriptor is normalized to unity. 

The 2D and 3D SIFT descriptors summarize efficiently the 
useful information contained in 2D and 3D images. Instead 
of working directly with whole images, it is usually more 
interesting (in terms of computational burden reduction) to 


6 = cos *( 


(8) 


work directly with 2D and 3D SIFT descriptors, specially if 
real-time object recognition is necessary. Generally, the objects 
characterized by 2D and 3D SIFT descriptors have different 
number of keypoints which makes the feature fusion (FF) 
problem for object recognition very challenging. For example, 
for a simple object like an apple, we can get 45 keypoints 
using 3D SIFT descriptor, and 38 keypoints using 2D SIFT 
descriptor. To overcome this problem, we adopt the Bag of 
Words (BoW) model [18] to gather the statistics of the 2D 
and 3D SIFT descriptors to describe the objects. 


— BoW model for features vector 


In the BoW feature model, the feature descriptors of all 
the interest points are quantized by clustering them into a 
pre-specified!° number of clusters. Instead of using k-means 
algorithm as in [2], we use the k-means++ method [23] which 
selects more effectively the initial cluster centers to complete 
this step. The resultant cluster centers are now called visual 
words, while the collection of these cluster centers is referred 
to as the visual word vocabulary. Once our vocabulary is 
computed, the descriptors are matched to each visual word 
based on the Euclidean distance and the frequency of the 
visual words in image and in point cloud is accumulated into 
a histogram, which is the BoW feature vector of the image 
and of the point cloud. So each object in 2D image and in 
3D point cloud is described by a 1 x 300 BoW-based feature 
vector denoted respectively BoW2p and BoW3p. These two 
BoW-based feature vectors will be used for feeding the trained 
SVM classifiers to get the final object recognition. 


Step 2: Classifier design 


Once the object description is completed, SVMs are trained 
to learn objects categories and to perform the object clas- 
sification. SVM is a supervised and discriminative machine 
learning method providing usually good performance. Through 
offline training of pre-limited samples, we seek a compromise 
between model complexity and learning ability, to get a 
good discriminant function [19]. Linear SVM classifier is 
applied for its efficiency and it is a typical classifier for 
two categories problems. In many real-life applications, we 
are face to multi-category classification problems and we use 
trained 1V1 SVMs between classes to set up a multi-category 
classifier. The training process is done as follows: for training 
samples belonging to the zth category, we make a pairwise 
SVM training with respect to all the other classes. So, we get 
C? = n(n — 1)/2 1V1 SVM classifiers for training samples 
of n categories. 


Step 3: Features fusion strategies 


When the two BoW-based features vectors of the object to 
recognize have been computed from 2D and 3D SIFT descrip- 
tors, we have to use them to achieve the object recognition 
thanks to the trained SVMs from the BoW-based features 
vectors of known objects of our data base. In this paper, we 


!0Tm our simulations, we took K = 300. 
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present briefly the following different strategies that we have 


tested: 


1) The direct feature-level fusion strategy: this feature-level 


fusion is for feeding SVM classifiers in training phase 
and then making object recognition. With this strategy 
we combine (fuse) directly the two BoW-based feature 
vectors BoW2p and BoW3p, and we feed the trained 
(global) SVM classifiers with the fused vector to get the 
final recognition. The principle of our method based on 
this strategy is summarized in Fig 2. 


3D SIFT, BoW Feature- 
Point Feature Feature 
loud decerinith | ‘ level 
clou lescription vector F 
Object to be P fusion Object 
recognized 2D SIFT restuie Bow Fasnire description 
Image ~———> one >| 
description vector 
R Voting 
Multi trained 1V1 SVMs >| P(i) > Recognition result 


Fig. 2: Direct feature-level fusion strategy. 


2) The decision-level fusion strategy: each BoW-based 
feature vector BoW2p and BoWs3p feeds a spe- 
cific trained SVM to get separately the corresponding 
recognition result. Then we test different fusion rules 
to combine these two recognition results to get the 
final fusioned recognition result. In this work we have 
evaluated the performances of the following rules: 

e Average weighted fusion rule, 

e PCR6 fusion rule of DSmT [24], 

e Murphy’s rule of combination [26]. 


The principle of our method based on this strategy is 
summarized in Fig 3. 


3D SIFT 
Point Feature pow Object 
Object to be cloud description description 
i 2D SIFT 
recognized image Feature bow Object 
8 description description Al 
Multitrained | Voting 
LW pees pres P(i)_3D Decision-level fusion 
s Py Le Recognition 
Multi trained Voting P(i)_2D result 
1V1 SVMs = 


Fig. 3: Decision-level fusion strategy. 


1) The direct feature-level fusion strategy 


This strategy consists of the following steps: 
l-a) For any object to classify, we extract its 2D and 3D 


SIFT descriptors associated with each keypoint. So we 
get Nop 2D SIFT descriptors of size 1 x 128 if one has 
extracted Nop keypoints from the 2D image under test, 
and we get N3p 3D SIFT descriptors of size 1 x 128 
if one has extracted N3p keypoints from the 3D point 
cloud under test. 


1-b) From the N2gp 2D SIFT descriptors of size 1 x 128, we 


compute 1 x 300 BoW feature vectors BoW 2p, and 
from the N3p 3D SIFT descriptors of size 1 x 128, we 


l-c 


1-d 


l-e 


wm 


wm 


wm 


compute 1 x 300 BoW feature vectors BoW3p thanks 
to the BoW model representation [18]. 

The direct feature-level fusion is done by  stack- 
ing the BoW-based feature vectors BoWop and 
BoW3p to get a 1 x 600 vector BoW2p 3p 
[BoW2p, BoW3p]. 

The feature-level fused vector BoW2p 3p is fed in all 
lv1 trained SVMs to get the corresponding discriminant 
results. The probability P(i) of the object to belong to 
the category c; (¢ = 1,2,...,m) is estimated by voting. 
The object is associated to the category (or class) having 
the largest probability, that is: 


Class(Object) = arg max {P(i)}. (9) 


2) The decision-level fusion strategy 
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As stated before, with this strategy each BoW-based feature 
vector BoW2p and BoW3p feeds a specific trained SVM 
to get separately the corresponding recognition result. Then 
different fusion rules can be used to combine these two 
recognition results to get the final fusionned recognition result. 


2-a) The average weighted fusion rule: This very simple 


2-b 


wm 


tule consists of a voting procedure. The BoW 2p and 
BoW3p vectors feed separately all corresponding lv1 
trained SVMs to get the discriminant results, and we 
compute the corresponding number of votes vote[i] for 
each class c;, i = 1,2,...,n. We will denote votezp |i] 
the distribution of votes drawn from 2D SIFT, and 
vote3p |i] the distribution of votes drawn from 3D SIFT. 
The probability P2p(i) of the object to belong to the 
class c; based on 2D SIFT descriptors is estimated 
by Pap(i) = votezp|i]/ T}_, votezp |i], similarly we 
have P3p(i) = vote3p[i|/>>}_, votesp|i]. Then the 
voting results drawn from SVMs feeded with 2D and 
3D SIFT are averaged to obtain the fusion result. 
PCR6 combination rule: The BBA (Basic Belief As- 
signment) mj,(.) and me(.) are built from the empirical 
probability obtained by voting procedure described in 2- 
a). The elements of the frame of discernment © are the 
n different classes ci, C2, ..., Cn. To get the final result, 
the BBA’s ™ (.) and mo(.) are fused using the PCR6 
combination rule!! [24], defined by mpcre(0) = 0 and 
for all X #0 in 2°, 


mpcro(X) = bm m1(X1)ma2(X2)+ 

rm(X)?m(¥) _m(X)?mi(Y) 
vee lm) + ma¥) : ma(X) +m (VY) 
XnNY=6 


(10) 


'IPCR6 formula coincides with the formula of PCRS fusion rule here 
because one considers only two BBA’s to combine. If more than two BBA’s 
have to be fused altogether, we advise to use PCR6 rather than PCRS - see 


[25] for a theoretical justification. 
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where all denominators in Eq.(10) are different from 
zero. If a denominator is zero, that fraction is discarded. 
All propositions/sets are in a canonical form. 
Murphy’s rule: Taking the feature-level fusion of 2D 
and 3D SIFT as a separate feature, together with the 2D 
and 3D SIFT, there are three features. Then the BBA 
m ,(.), ™m2(.) and m3(.) are built from the empirical 
probability obtained by the voting procedure. The vote 
results of the features are combined based on the Mur- 
phy rule!” [26]. 


2-c) 


III. SIMULATION RESULTS 
A. The experimental setup 


We evaluate the recognition algorithm on a large-scale 
multi-view object dataset collected using an RGB-D camera 
[27]. This dataset contains color, depth images and point 
clouds of 300 physically distinct everyday objects taken from 
different viewpoints. The objects belong to one of 51 cate- 
gories and contain three viewpoints. To test the recognition 
ability of our features, we test category recognition on objects 
that were not present in the training set. At each trial, we 
randomly choose one test object from each category and train 
classifiers on the remaining objects. We randomly choose 100 
training samples and 60 test samples for each category. The 
object recognition rate (ORR) is calculated by 


ORR = n,/N. (11) 


where n, is the number of objects correctly recognized, and 
N is the total number of test samples. 


B. Experiment results and analysis 


B.1 Accuracy of our 3D SIFT descriptor 

In this simulation, we choose six categories with significant 
intra-class variations and high inter-class similarities. The 
objects to recognize are apple, tomato, banana, pitcher, 
cereal_box, and kleenex. The Point Feature Histogram (PFH) 
[11] and PFHRGB methods in open PCL [22] outperform the 
existed 3D features based on point clouds [28]. In order to 
verify the advantages of the proposed 3D SIFT for GOR, we 
compare these tree feature descriptors under the same condi- 
tions. Keypoints are detected using SIFTKeypoint module in 
open PCL [22] for each feature descriptors. Then the vectors 
of different feature descriptors of the keypoints are calculated. 
The object recognition rates (ORR) that we get are shown in 
Table I. 


Type of feature descriptor ORR (in %) 
PFH based on [11] 81.39 
PFHRGB based on [22] 84.17 
3D SIFT based on this paper 91.11 


TABLE I: Object recognition rates (ORR) of three descriptors. 


The PFHRGB descriptor is an improved PFH feature de- 
scriptor enriched with color information which allows to im- 
proves object recognition rate. As shown in Table 1, compared 


!2Because results of the fusion with Dempster’s rule are very close to results 
with Murphy’s rule in our applications, we do not report them in our analysis. 


with PFH and PFHRGB, the object recognition rate we get 
with our 3D SIFT descriptor adapted for point cloud gains 
6.94% w.r.t. PRFHRGB and 9.72% w.r.t PFH. 


B.2 Performances of feature fusion strategies 

Here, we evaluate the performance (i.e. the ORR) of 
the different features fusion strategies presented in Sec- 
tion II (Step 3). We have chosen 10 categories (apple, 
tomato, banana, pitcher, cereal_box, kleenex, camera, 
cof fee_mug, calculator, cell_phone) having significant 
intra-class variations and high inter-class similarities. We com- 
pare our four fusion approaches: the direct feature-level fusion 
and the three decision-level fusions (by average weighted 
fusion, PCR6, and Murphy’s rule). The results are shown in 
Fig. 4. 


Exact recognition rate 
sc 9 


= © =PFHRGB 
O'' 2DSIFT 
‘=f 3D SIFT 
“Pr ave 
=e DSmT 


= @ =Murphy 
—he— 2D+3D SIF T 


5 6 Fs 
Number of classes 
Fig. 4: Performances of the four feature fusion strategies. 


where the legend of curves of Fig.4 must be read as follows: 
DSmT means PCR6 rule in fact, 2D+3D SIFT means the 
direct feature-level fusion of 2D and 3D SIFT, and ave means 
the average weighted feature fusion rule. The horizontal axis 
represents the total number of categories that we have tested. 
Due to the variability of the objects, the information provided 
by a single feature is too imprecise, uncertain and incomplete 
for getting good ORR. As shown in Fig.4, ORR obtained with 
the different feature fusion strategies are better than the ORR 
obtained with the best single descriptor. The results of average 
weighted fusion and PCR6 are close, but are lower than the 
other two fusion methods. Feature-level fusion of 2D and 3D 
SIFT is taken as the third feature for Murphy’s rule. However, 
compared with the feature-level fusion, the performances of 
Murphy’s rule do not improve. So, the direct feature-level 
fusion performs best among these fusion strategies, and the 
following experiments are completed based on the direct 
feature-level fusion. One clearly sees that 3D SIFT proposed 
in this work significantly outperforms 2D SIFT and PFHRGB 
descriptors for GOR. As shown in Fig.4, ORR decreases 
with the increasing of the number of categories because of 
the design of the multi-category classifier which consists of 
many 1V1 SVM classifiers. Each classification error will be 
accumulated to the final voting results, leading to an increasing 
of recognition errors. 
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B.3 Robustness to intra-class variation and inter-class 
similarities 

In this study, we compare the ORR performances in dif- 
ferent classes having high similarity (e.g., apple and tomato), 
and in the same class but having strong variation (e.g., pitcher 
object) as in Figs. 5 and 6 below. We evaluate the accuracy 


Fig. 5: Apple and Tomato. 


Fig. 6: Pitchers. 


of PFHRGB, 2D SIFT, 3D SIFT and the feature-level fusion 
of 2D and 3D SIFT under the same conditions. Training and 
testing samples are the same as in the first experiment. Our 
simulation results are shown in Table II. 


Feature descriptor | PFHRGB | 2D SIFT | 3D SIFT | 2D+3D SIFT 
ORR (apple) 61.67 53.33 71.67 65.00 
ORR(tomato) 100 98.33 91.67 100 

ORR (banana) 91.67 93.33 93.33 100 
ORR (pitcher) 70.00 95.00 96.67 98.33 
ORR(cereal_box) 91.67 98.33 95.00 95.00 
ORR(kleenex) 90.00 90.00 100 100 
Averaged ORR 84.17 88.06 91.11 93.06 
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Fig. 7: ORR Performances under 3 angles of view. 


and robust for category recognition even under very distinct 
angles of view. 


B.5 Robustness to size scaling 

The training samples are the same as in the first experiment. 
To evaluate the robustness of our method to size scaling 
(zooming), the test samples are zoomed out to 1/2, 1/3 and 
1/4. As shown in Table III. 


TABLE II: ORR (in %) of different classes. 


As we see from Table II, using 3D SIFT increases the ORR 
of 3.05% w.r.t. 2D SIFT. This shows that the introduction of 
the depth information improve the quality of object recogni- 
tion. Three different objects of the pitcher class are shown 
in Figure 6. As we see, there are great differences within 
such class. 3D SIFT achieves ORR with 96.67% accuracy, 
much superior to the 70% obtained with PFHRGB. Apple and 
tomato displayed in Figure 5 look highly similar even if they 
belong to two distinct classes. 3D SIFT provides much better 
ORR than the other descriptors. As shown in Table II, our 
GOR method based on feature-level fusion of 2D and 3D SIFT 
offer better robustness to intra-class variations and inter-class 
similarities, and 3D SIFT gives higher accuracy than the other 
single descriptors. 


B.4 Robustness to changes of the angle of view 

In this experiment, we evaluate the performance of our GOR 
method when applied under different observation conditions, 
more precisely when the objects are observed under three very 
distinct angles of view (30 deg, 45 deg and 60 deg).Training 
samples are the same as the Experiment 1. Randomly select 
60 objects from each view to be as the test samples. So for 
each view, there are 360 test samples from 6 categories. The 
experimental results are shown in Fig. 7. 

From Fig. 7, one sees that ORR with 3D SIFT is relatively 
accurate and stable compared with PFHRGB descriptor. The 
direct feature-level fusion strategy (with ORR > 90%) offers 
much better ORR than using the best single descriptor, which 
indicates that the combination of 2D and 3D SIFT is effective 


Feature descriptor PFHRGB | 2D SIFT | 3D SIFT | 2D+3D SIFT 
ORR (no Zoom) 84.17 88.06 91.11 93.06 
ORR (Zoom=1/2) 74.44 77.50 76.67 82.78 
ORR (Zoom=1/3) 63.33 64.17 65.28 68.89 
ORR (Zoom=1/4) 61.39 46.94 61.67 63.05 


TABLE HI: Averaged ORR (in %) for different zoomings. 


As one sees in Table III, our GOR method with fusion is 
superior to the algorithm based on single descriptor. However, 
the ORR of each feature descriptor has decreased. Especially 
when zoomed to 1/4, the accuracy of ORR with 2D SIFT is 
only 46.94%. The main reason is that part of the images, such 
as apple (whose original size is only 84 x 82) after scaling, 
reduces the number of useful keypoints. The feature-level 
fusion algorithm still provides an averaged ORR of 63.05%. 


B.6 Computational time evaluation 

The computational times (CT) of the different feature de- 
scriptors have been evaluated with an i7-3770@3.4GHz CPU, 
under x64 Win7 operating system and are shown in Table 
IV. The training and test samples are the same as in the first 
experiment. Because the Point cloud model contains a larger 
amount of data and richer information than image, therefore 
CT using point cloud is relatively long, which is normal. The 
largest proportion of CT in the whole recognition process 
is the feature extraction and description. 3D SIFT includes 
keypoints detection and description. If the points’ number of 
the object is n, the time complexity of keypoints detection is 
O(octaves-scale-k-n). Because the pyramid layers octaves, 
scale of each layer scale and neighborhood of key points k 
are constant, the time complexity is O(n). For the detected m 
keypoints, the time complexity of calculating the descriptors 
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of the key points is O(mn). So the time complexity of 3D 
SIFT is O(mn + n), ignoring lower-order item, the time 
complexity is O(mn). As seen in Table IV, the CT of 3D 
SIFT has diminished of 34.75% w.r.t. PFHRGB, and the CT 
performance with fusion of 2D and 3D SIFT turns out to be 
faster (22.07%) than PFHRGB, and the ORR performance is 
substantially improved. 


Feature descriptors CT of CT of 

360 test samples (in s) | each test sample (in s) 
PFHRGB 3404.628 9.4573 
3D SIFT 2221.608 6.1711 
2D+3D SIFT 2653.272 7.3702 


TABLE IV: Computational times for feature descriptors. 


IV. CONCLUSIONS 


Because there are many complex objects in the real scenes 
we observe in the nature and because of possible large intra- 
class variations and high inter-class similarities, the generic 
object recognition (GOR) task is very hard to achieve in 
general. In this paper we have proposed a new GOR method 
based on 2D and 3D SIFT descriptors that allows to calculate 
multiple feature vectors which are combined with different 
strategies, and feed SVM classifier for making object recog- 
nition. The evaluation of the performances based on real 
open-datasets has shown the superiority of our new 3D SIFT 
descriptor adapted for point cloud with respect to the existing 
3D features such as PFHRGB. Our GOR method based on 
feature fusion of 2D and 3D SIFT works better than the 
one using best single feature. For now, if the environment 
substantially changes, we have to retrain the system. To 
overcome this problem we will also consider background 
segmentation within GOR in future works. Also, we would like 
to reduce the computational time needed for feature extraction 
and description in maintaining good recognition rate, and we 
want to explore more feature fusion strategies to improve (if 
possible) the recognition performances. 


ACKNOWLEDGMENT 


This work was supported by NNSF of China (No. 
61175091), Qing Lan Project of Jiangsu Province, Aeronau- 
tical Science Foundation of China (20140169002), and Six 
Major Top-talent Plan of Jiangsu Province. 


REFERENCES 


[1] Y. Lei, M. Bennamoun, M. Hayat, Y. Guo, An efficient 3D face recogni- 
tion approach using local geometrical signatures, Pattern Recognition, 
Vol. 47(2), pp. 509-524, 2014. 

[2] X.-D. Li, X. Zhang, B. Zhu, X.-Z. Dai, A Visual Navigation Method 
for Robot Based on a GOR and GPU Algorithm, Robot, Vol. 34(4), pp. 
466-475, 2012 (in Chinese). 

[3] S. Allaire, J.J. Kim, S.L. Breen, D.A. Jaffray, V. Pekar, Full orientation 
invariance and improved feature selectivity of 3D SIFT with application 
to medical image analysis, Proc. IEEE CVPR Workshops, Anchorage, 
AK, USA, 23-28 June 2008. 

[4] D.G. Lowe, Object recognition from local scale-invariant features, Proc. 
of IEEE CCV Conf., Vol. 2, pp. 1150-1157, Corfu, Greece, Sept. 1999. 
[5] D.G. Lowe, Distinctive Image Features from Scale-Invariant Key points, 

Int. J. of Computer Vision, Vol. 60(2), pp. 91-110, 2004. 


140 


[6 


[7 


[8] 


[9 


10 


11 


12 


16 


U7. 


26 


27 


P. Scovanner, S. Ali, M. Shah, A 3-dimensional SIFT descriptor and its 
application to action recognition, Proc. of 15th ACM MM Conf., pp. 
357-360, Augsburg, Germany, Sept. 23-29, 2007. 

W. Cheung, G. Hamarneh, N-SIFT: N-dimensional scale invariant fea- 
ture transform for matching medical images, Proc. of 4th IEEE Int. 
Symp. on Biomedical Imaging, pp. 720-723, Arlington, VA, USA, 2007. 
R.N. Dalvi, I. Hacihaliloglu, R. Abugharbieh, 3D ultrasound vol- 
ume stitching using phase symmetry and Harris corner detection 
for orthopaedic applications, Proc. of SPIE (Medical Imaging 2010), 
Vol. 7623, San Diego, CA, USA, 2010. 

M. Niemeijer, et al., Registration of 3D spectral OCT volumes using 3D 
SIFT feature point matching, Proc. SPIE Vol. 7259 (Medical Imaging 
2009), Lake Buena Vista, FL, USA, 27 March 2009. 

G.T. Flitton, T.P. Breckon, N. Megherbi, Object Recognition using 
3D SIFT in Complex CT Volumes, Proc. of BMV Conf., pp. 1-12, 
Aberystwyth, UK, Aug 31-Sept. 3rd, 2010. 

R.B. Rusu, N. Blodow, Z.C. Marton, M. Beetz, Aligning point cloud 
views using persistent feature histograms, Proc. of IEEE/RSJ Int. Conf. 
on Intelligent Robots and Syst., pp. 3384-3391, Nice, France, 2008. 
R.B. Rusu, N. Blodow, M. Beetz, Fast point feature histograms (FPFH) 
for 3D registration, Proc. of IEEE Int. Conf. on Robotics and Autom., 
pp. 3212-3217, Kobe, Japan, 2009. 

S. Lazebnik, C. Schmid, J. Ponce, A sparse texture representation using 
local affine regions, TEEE Trans. on PAMI, Vol. 27(8), pp. 1265-1278, 
2005. 

X.-D. Li, J.-D. Pan, J. Dezert, Automatic Aircraft Recognition using 
DSmT and HMM, Proc. of Fusion 2014, Salamanca, Spain, July 2014. 
L. Bo, K. Lai, X. Ren, D. Fox, Object recognition with hierarchical ker- 
nel descriptors, Proc. of CVPR IEEE Conf., pp. 1729-1736, Colorado 
Springs, CO, USA, June 2011. 

L. Bo, X. Ren, D. Fox, Depth kernel descriptors for object recognition, 
Proc. of IEEE/RSJ IROS Conf., pp. 821-826, San Francisco, CA, USA, 
Sept. 2011. 

M. Mirdanies, A.S. Prihatmanto, E. Rijanto, Object Recognition System 
in Remote Controlled Weapon Station using SIFT and SURF Methods, 
J. of Mechatronics, Elect. Power, and Vehicular Techn., Vol. 4(2), pp. 
99-108, 2013. 

J. Sivic, A. Zisserman, Video google: A text retrieval approach to objects 
matching in videos, Proc. of 9th CCV Conf, pp. 1470-1477, 2003. 
B.E. Boser, ILM. Guyon, V.N. Vapnik, A training algorithm for optimal 
margin classifiers, Proc. of the 5th ACM Workshop on Comput. learning 
theory, pp. 144-152, Pittsburgh, PA, USA, 1992. 

SIFT demo program (Version 4, July 2005). 

http://www.cs.ubc.ca/ lowe/keypoints/ 

R. Hess, An Open Source SIFT Library, ACM MM, 2010. 
http://robwhess. github.io/opensift/ 

R.B. Rusu, S. Cousins, 3D is here: Point cloud library (PCL), Proc. 
of IEEE Int. Conf. on Robotics and Autom., pp. 1-4, Shanghai, China, 
2011. 

D. Arthur, S. Vassilvitskii, K-means++: The advantages of careful 
seeding, Proc. of SODA ’07, pp. 1027-1035, 2007. 

F. Smarandache, J. Dezert (Editors), Advances and applications of DSmT 
for information fusion, ARP, Rehoboth, NM, U.S.A., Vol. 1-4, 2004— 
2015. https://www.onera.fr/fr/staff/jean-dezert/references 

F. Smarandache, J. Dezert, On the consistency of PCR6 with the 
averaging rule and its application to probability estimation, Proc. of 
Fusion 2013, Istanbul, Turkey, July 2013. 

C.K. Murphy, Combining Belief Functions when Evidence Conflicts, 
Decision Support System, Vol. 29(1), pp. 1-9, 2000. 

K. Lai, L.-F. Bo, X.-F Ren, D. Fox, A Large-Scale Hierarchical Multi- 
View RGB-D Object Dataset, Proc. of IEEE Int. Conf. on Robotics and 
Autom., pp. 1817-1824, Shanghai, China, 2011. 

L.A. Alexandre, 3D Descriptors for Object and Category Recognition: 
a Comparative Evaluation, Workshop on Color-Depth Camera Fusion in 
Robotics at the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 
IROS 2012, October 7-12, Vilamoura, Portugal, 2012. 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


On the Quality Estimation of Optimal Multiple 
Criteria Data Association Solutions 


Jean Dezert®, Kaouthar Benameur®, Laurent Ratton’, Jean-Frang¢ois Grandin? 


“The French Aerospace Lab, ONERA, 91120 Palaiseau, France. 
>Thales Systémes Aéroportés, 2 av. Gay Lussac, 78990 Elancourt, France. 


Emails: jean.dezert@onera.fr, kaouthar.benameur @onera.fr, 
laurent.ratton @fr.thalesgroup.com, jean-francois.grandin @ fr.thalesgroup.com 


Originally published as: J. Dezert, K. Benameur, L. Ratton, J.-F. Grandin, On the Quality Estimation of 
Optimal Multiple Criteria Data Association Solutions, in Proc. of Fusion 2015, Washington D.C, USA, 


July 6-9, 2015, and reprinted with permission. 


Abstract—In this paper, we present a method to estimate the 
quality (trustfulness) of the solutions of the classical optimal 
data association (DA) problem associated with a given source of 
information (also called a criterion). We also present a method to 
solve the multi-criteria DA problem and to estimate the quality of 
its solution. Our approach is new and mixes classical algorithms 
(typically Murty’s approach coupled with Auction) for the search 
of the best and the second best DA solutions, and belief functions 
(BF) with PCR6 (Proportional Conflict Redistribution rule # 
6) combination rule drawn from DSmT (Dezert-Smarandache 
Theory) to establish the quality matrix of the global optimal 
DA solution. In order to take into account the importances of 
criteria in the fusion process, we use weighting factors which 
can be derived by different manners (ad-hoc choice, quality of 
each local DA solution, or inspired by Saaty’s Analytic Hierarchy 
Process (AHP)). A simple complete example is provided to show 
how our method works and for helping the reader to verify by 
him or herself the validity of our results. 


Keywords: Data association, Multi-criteria analysis, belief 
functions, PCR6, DSmT. 


I. INTRODUCTION 


Efficient algorithms for modern multisensor-multitarget 
tracking (MS-MTT) systems [1], [2] require to estimate and 
predict the states (position, velocity, etc) of the targets evolving 
in the surveillance area covered by a set of sensors. These 
estimation and prediction are based on sensors measurements 
and dynamical models assumptions. In the monosensor con- 
text, MTT requires classicallyto solve the data association 
(DA) problem to associate the available measurements at a 
given time with the predicted states of the targets to update 
their tracks using filtering techniques (Kalman filter, Particle 
filter, etc). In the multisensor MTT context, we need to solve 
more difficult multi-dimensional assignment problems under 
constraints. Fortunately, efficient algorithms have been devel- 
oped in the operational research and tracking communities for 
formalizing and solving these optimal assignments problems 
(see the related references detailed in the sequel). 

Before going further, it is necessary to recall briefly the basis 
of DA problem and the methods to solve it. This problem 
can be formulated as follows: We have m> 1 targets Tj 
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(i =1,...,m), and n > 1 measurements! z; G=1,...,n) 
at a given time k, and a m x n rewards (gain/payoff) matrix 
Q = [w(i,7)] whose elements w(i,j) > O represent the 
payoff (usually homogeneous to the likelihood) of the asso- 
ciation of target T; with measurement z,;, denoted (Tj, z;). 
The data association problem consists in finding the global 
optimal assignment of the targets with some measurements by 
maximizing? the overall gain in such a way that no more than 
one target is assigned to a measurement, and reciprocally. 

Without loss of generality, we can assume w(i,j) >0 
because if some elements w/(i,7) of (Q were negative, we 
can always add the constant value? to all elements of Q to 
work with a new payoff matrix Q/ = [w’(i,7)] having all 
elements w’(i,7) > 0, and we get same optimal assignment 
solution with Q and with 9’. Moreover, we can also assume 
without loss of generality m <n because otherwise we can 
always swap the roles of targets and measurements in the 
mathematical problem definition by working directly with 
Q¢ instead, where the superscript ¢ denotes the transposition 
of the matrix. The optimal assignment problem consists of 
finding the m x n binary association matrix A = [a(?,7)] 
which maximizes the global rewards 


R(Q, A) SOY lw, jali, 9), (1) 
i=1 j=1 
ya Wj) = 1 (i=1,...,m), 
subject to yg) SL Sy cay), (2) 


a(i,j) € {0,1}. 

The association indicator value a(i,j) = 1 means that 
the corresponding target T; and measurement z; are asso- 
ciated, and a(?,7) =0 means that they are not associated 
@=1,...,mand 7 =1,...,n). 


'In a multi-sensor context targets can be replaced by tracks provided by 
a given tracker associated with a type of sensor, and measurements can be 
replaced by another tracks set. In different contexts, possible equivalents are 
assigning personnel to jobs or assigning delivery trucks to locations. 

7In some problems, the matrix Q = [w(i,7)] represents a cost matrix 
whose elements are the negative log-likelihood of association hypotheses. In 
this case, the data association problems consists in finding the best assignment 
that minimizes the overall cost. 

3equals to the absolute value of the minimum of 2. 
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The solution of the optimal assignment problem stated in 
(1)-(2) is well reported in the literature and several efficient 
methods have been developed in the operational research 
and tracking communities to solve it. The most well-known 
algorithms are Kuhn-Munkres (or Hungarian) algorithm [3], 
[4] and its extension to rectangular matrices proposed by 
Bourgeois and Lassalle in [5], Jonker-Volgenant method [6], 
and Auction [7]. More sophisticated methods using Murty’s 
method [8], and some variants [9], [10], [11], [12], [13], [14], 
[15], are also able to provide not only the best assignment, but 
also the m-best assignments. We will not present in details all 
these classical methods because they have been already well 
reported in the literature [16], [17]. 


The purpose of this paper is to propose a solution for 
two important problems related with the aforementioned Data 
Association issue: 

e Problem 1 (mono-criterion): Suppose that the DA reward 
QQ, has been established based on a unique criterion C; then 
we want to evaluate the quality* of each association (pairing) 
provided in the optimal solution by one of the aforementioned 
algorithms. The choice of the algorithm does not matter as 
soon as they are able to provide the optimal DA solution 
represented by a binary matrix A, (assumed to be unique 
here for convenience). So based on (2, and Aj, we want to 
estimate the quality matrix Q, of the optimal pairing solutions 
given in A,. This quality matrix will be useful to select 
optimal association pairings that have sufficient quality to be 
used to update the tracking filters, and not to use the optimal 
data associations that have a poor quality, which will save 
computational time and avoid to potentially degrade tracking 
performances. 

e Problem 2 (multi-criteria): We assume that we have 
different Rewards matrices Q),...,Q« (K > 1), established 
from different criteria from which we can draw optimal DA 
solutions A;,...,A, with their corresponding quality ma- 
trices Qi,...,Qx (obtained by the method used for solving 
Problem 1). We assume that each criterion Cy, k= 1,...,K 
has its own importance with respect to the others which is 
expressed either by a given relative importance K x K matrix 
M, or directly by a weighting M x 1 vector w. The problem 
2 consists in finding the optimal (i.e. the one generating the 
best global quality) DA solution based on all information 
drawn from the independent multiple criteria we have, that 
is from Q), ..., Qx and M (or w) in a well-justified and 
comprehensive manner. 


This paper is organized as follows: in section 2 we present 
a method for solving problem 1 which uses both Ist-best and 
2nd-best DA solutions provided by Murty’s algorithm. Our 
method is based on Belief Functions (BF), the Proportional 
Conflict Redistribution fusion rule #6 (PCR6) developed in 
Dezert-Smarandache Theory (DSmT) framework [19], and the 
pignistic probability transform. Section 3, proposes a solution 


‘In this paper, the quality of a pairing of the optimal DA solution refers to 
a confidence score which corresponds to a degree of trustfulness one grants 
to this pairing for taking the decision to use it, or not. 


for Problem 2 exploiting Saaty’s AHP method, BF and also 
Murty’s algorithm. Section 4 presents a full simple detailed 
example to show how the method works for readers who want 
to check by themselves our results. Section 5 will conclude 
this paper with perspectives. 


II. SOLUTION OF PROBLEM | (MONO-CRITERION) 


This solution has already been addressed in details in [21] 
and we will just briefly present here the main ideas for making 
this paper self containing. In problem 1, we want to establish 
a confidence level (i.e. a quality indicator) of the pairings of 
the optimal data association solution. More precisely, we are 
searching for an answer to the question: how to measure the 
quality of the pairings a(i,7) = 1 provided in the optimal 
assignment solution A? The necessity to establish a quality 
indicator is motivated by the following three main practical 
reasons: 


1) In some practical tracking environment with the 
presence of clutter, some association decisions 
(a(t,j) = 1) are doubtful. For these unreliable 


associations, it is better to wait for new information 
(measurements) instead of applying the hard data 
association decision, and making potentially serious 
association mistakes. 


2) In some multisensor systems, it can be also important 
to save energy consumption for preserving a high 
autonomy of the system. For this goal, only the most 
trustful specific associations provided in the optimal 
assignment have to be used instead of all of them. 


3) The best optimal assignment solution is not necessarily 
unique. In such situation, the establishment of quality 
indicators may help in selecting one particular optimal 
assignment solution among multiple possible choices. 


It is worth noting that the Ist-best, as well as the 2nd- 
best, optimal assignment solutions are unfortunately not nec- 
essarily unique. Therefore, we need to take into account 
the possible multiplicity of assignments in the analysis of 
the problem. The multiplicity index of the best optimal as- 
signment solution is denoted 6; > 1, and the multiplicity 
index of the 2nd-best optimal assignment solution is denoted 
B2 > 1, and we will denote the sets of corresponding as- 
signment matrices by A; = fae ky =1...,81} and by 
A2 = {aes kg =1..., 82}. Here are three simple examples 
with different multiplicities in solutions: 


8 1 2 
5 3 3}> then 6; = 2 and 


{Bz = 1 because the Ist best and 2nd best DA solutions are 


ee, 0 4 ea, 0 Ao=|} 0 1 
a aa p) slg a; r] =. . 


Example 1: If we take Q = 


0 1 0 0 0 1 1 0 0 
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6 3 9 
14 j|> then 61 = 1 and 


Example 2: If we take Q = | 
{By = 2 because the Ist best and 2nd best DA solutions are 


a=|? 0 S|, 0 g]age?= |? 0 ie 


0 1 0 0 1 0 1 0 0 
tL 2.3 
Example 3: If we take Q = 45 61° then 6, = 2 and 
{2 = 2 because the Ist best and 2nd best DA solutions are 


Abate E 1 | . Abn? o k 0 4 . 


001 0 1 0 
[1 0 0 xo (to 4 
> ae kg=2 __ 

Ay =lo 0 a): Ay =o al: 


To establish the quality of the specific associations (pair- 
ings) (i,7) satisfying a,(i,7) = 1 belonging to the optimal 
assignment matrix Aj, we propose to use both A; and 2nd- 
best assignment solution A». The basic idea is to use the 
values a;(i,j) = 1 in the best, and ag(i,7) in the 2nd-best 
assignments to identify the change (if any) of the optimal 
pairing (7,7). In fact, we assume? that higher quality of an 
entry in a quality matrix suggests that its association in an 
optimal solution is more stable across those good solutions. 
The connection between the stability of an association across 
the good solutions and the stability over an error in measure- 
ment is done through the components of the reward matrices 
(the inputs of our method) which must take into account 
the measurement uncertainties. Based on this assumption, our 
quality indicator will be defined using both the stability of 
the pairing and its relative impact in the global reward. This 
proposed method works also when the 2nd-best assignment 
solution Ag is not unique (as shown in examples 2 and 3). 
Our method helps to select the best (most trustful) optimal 
assignment in case of multiplicity of A, matrices. We do 
not claim that the definition of the quality matrix proposed 
in this work is the best proposal. However, we propose a new 
comprehensive way of solving this problem from a practical 
standpoint. 

To take into account efficiently the reward values of each 
specific association given in the best assignment A, and in 
the 2nd-best assignment A? for estimating the quality of 
DA solutions, we propose to use the following construction 
of quality indicators depending on the type of matching: 

e When a;(i,j) = a5?(i,j7) = 0, one has full agreement 
on “non-association” (T;,2;) in Ay and in AS? and this 
non-association (T;, z;) has no impact on the global rewards 
Ri(Q, Ay) and Ro(Q, As), and it will be useless. Therefore, 
we can set its quality arbitrarily to any arbitrary value, typi- 
cally we take g*?(i, 7) = 0 because these values are not useful 
at all for the application (i.e. tracking) standpoint. 

e When aj (i,j) = a5?(i, 7) = 1, one has a full agreement on 
the association (T;,z;) in Ay and in A4?. his association 
(T;,z;) has however different impacts in the global rewards 
values Ry(Q,A1) and R2(Q, AS). To qualify the quality 


5This assumption has however not been proven formally yet and its validity 
is a challenging open-question left for future research works. 


of this association (T;,z;), we define the two basic belief 
assignments (BBA’s) on X = (T;,z;) and X U 7X (the 
ignorance), for s = 1,2 as follows: 
ms(X) = as(t,j)- w(t, j)/Rs(Q, As), (3) 
ms(X UAX) =1—m,(X). 
Applying the conjunctive fusion rule (here one has no con- 
flicting mass), we get 


m(X) = m1(X)me(X) + mi (X)me(X UAX), 
+m1(X UAX)m2(X), (4) 
m(X UaX) = mi(X UAX)me(X U7X). 


Applying the pignistic transformation® [20], we get finally 
BetP(X) = m(X) + 4-m(X Un7X) and BetP(7X) = 
+ -m(X UX). Therefore, we choose as quality indicator 
for the association (T;, z;) the value g*?(i, 7) = BetP(X) = 
m(X)+4-m(X UnX). 

e When aj(i,j) =1 and as?(i,j) =0, one has a disagree- 
ment (conflict) on the association (T;,z;) in A; and in 
(T;,2j.) in AS?, where jy is the measurement index such 
that a2(7, j2) = 1. To qualify the quality of this non-matching 
association (Tj, z;), we define the two following basic belief 
assignments (BBA’s) of the propositions X + (T;,z;) and 
ys (Ti, Zin) 


mi(X) = a(i,3)- aes, (5) 
m(X UY) =1—mi(X), 

and - w(t,j2) 
m2(Y) — az(i, j2) : Ra(0,Ae)’ (6) 
m2(X UY) =1—ma(Y). 


Applying the conjunctive fusion rule, we get m(XNY = 0) = 
my, (X)me (Y) and 


m(X) = mi(X)me(X UY), 
mY) =m1(X UY)ma(Y), (7) 
m(X UY) = mi(X UY)me2(X UY). 


Because we need to work with a normalized combined BBA, 
we can choose different rules of combination (say either 
Dempster-Shafer’s rule, Dubois-Prade’s rule, Yager’s rule [19], 
etc). In this work, we propose to use the Proportional Conflict 
Redistribution rule no. 6 (PCR6) proposed originally in DSmT 
framework [19] because it has been proved very efficient in 
practice [28], [29]. Hence with PCR6, we get: 


m(X) = my(X)mo(X UY) +mi(X)- mt Teme 


m(X UY) = mi(X UY)me2(X UY). 


(8) 
Applying the pignistic probability transformation, we 
get finally BetP(X)=m(X)+4-m(XUY) and 


®We have chosen here BetP for its simplicity and because it is widely 
known, but DSmP could be used instead for expecting better performances 
[19]. 
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BetP(Y) =m(Y)+4-m(X UY). Therefore, we choose 
the quality indicators as follows: g*?(i,7) = BetP(X), and 
Maio) = Beery ): 


The absolute quality factor Qays(Ai1) of the optimal as- 
signment given in A, conditioned by A, for any 
ka € {1,2,..., G2} is defined as 


BSS alia ge ©) 


t=1.4=1 


Qabs( (Ay , AS?) 


The absolute average quality factor Qaver(A1) per associ- 
ation of the optimal assignment given in A, conditioned by 
AY. for any kg € {1,2,..., 82} is defined by 

ko ty ko 
Qaver(A1, A5”) = on : Qabs(Ai, As 3 (10) 
where m is the number of “1” in the optimal DA matrix Aj 
(i.e. the number of targets). 


To take into account the eventual multiplicities (when 

B2>1) of the 2nd-best assignment solutions AF 
kg =1,2,...,82, we need to combine the Q;(Aj, A5”) 
values. Several methods can be used for this, in particular we 
can use either: 
- A weighted averaging approach: The quality indicator 
components q(7, 7) of the quality matrix Q are then obtained 
by averaging the qualities obtained from each comparison of 
Ay with AY. More precisely, one will take 


B2 
a(i,3) = S> w(AS?)q* (i, 4), (11) 
ko=1 
where w(A5?) is a weighting factor in [0,1], such that 


> ,w(A5?) =1. Since all assignments AS? have the 
same global reward value Ro, then we suggest to take 
w(A5?) = 1/2. A more elaborate method would consist of 
using the quality indicator of AP? based on the 3rd-best 
solution, which can be itself computed from the quality of 
the 3rd assignment solution based on the 4th-best solution, 
and so on by a similar mechanism. 

- A belief-based approach: (see [18] for basics on be- 
lief functions): A second method would express the qual- 
ity by a belief interval [g™™(i,j),q™**(i,7)] in [0,1] in- 
stead of single real number q(i,7) in [0,1]. More precisely, 
one can compute the belief and plausibility bounds of the 
quality by taking g™"(i, 7) = Bel(a,(i,7)) = ming, q*? (i, 7) 
and g™*(i, 7) = Pl(ai(i,j)) = max,, q*?(i,j). Hence for 
each possible pair (i,j), one can define a basic belief 
assignment (BBA) m,;(.) on the frame of discernment 
© = {T = trustful, >T = not trustful}, which characterizes 
the quality of the pairing (i,j) in the optimal assignment 
solution Aj, as follows 


Miz (hiis _— gn, 9d); 
maj (AT) = 1 — g™**(i, j), 
tg CUT) = 9G" G.9) — 7G): 


(12) 


Because only the optimal associations’ (i,j) such that 
ai(i,7) =1 are useful in tracking algorithms to update the 
tracks, we do not need to pay attention (compute and store) 
the qualities of components (i, 7) such that a;(i,j7) =0. In 
fact all components (7, 7) such that a1(7, 7) = 0 should be set 
to zero by default in Q matrix. 


Example 4: Let’s consider the rewards matrix 


1 11 45 30 
Q=}17 8 38 27 
10 14 35 20 


We get one Ist best (8; = 1) and four 2nd best (82 = 4) 
DA solutions with their respective qualities as follows: 


0 0 1 0 
A, =|0 0 0 1 => R1(O, Ai) = 86, 
0 0 


01 
Oi 26 
AP-l_ 19 0 1 => R,(2, AP!) = 82, 
010 0 
— 0 0 059 0 
Q(Ai, A=) z]0 0 0 0.41], 
0 065 0 0 


00 1 0 
AP=2_ 11 0 0 0] > R2(2, A=) = 82, 
oo O° 4 
0 0 089 0 
Q(A;, A=?) ]0 0 0 0.56], 
0045 0 0 
00 1 0 
AP _— 10 0 0 1| > Rp(0, AS") = 82, 
Leth: 
0 0 089 0 
Q(A;, AK*=3) 2/0 0 0 0.76], 
0052 0 0 
0001 
AP=4_ 11 0 0 0] > R(0, A”) = 82, 
O08 1 0 
0 0 059 0 
Q(A,, AP) =/0 0 0 0.56], 
0 035 0 O 


Note that the absolute quality factors are : 


Qabs(A1, AX") 1.66, 
Og Aa AH") 9 10. 


Onn Aa, AG) 101, 
Onl Ar, AD) 1.51. 


Therefore, we can see that 


Qa Mi Ay SO AAP) 
Sahay AS SO a(Ai,Ar), 


7found using Murty’s algorithm. 
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which makes perfectly sense because A; has more matching 
pairings with Ave than with others 2nd-best assignments 
AY? (ko #3). These pairings have also the strongest impact 
in the global reward value. Therefore, the quality matrix Q 
differentiates the quality of each pairing in the optimal assign- 
ment A, as expected. This method provides an effective and 
comprehensive solution to estimate the quality of each specific 
association provided in the optimal assignment solution Aj. 
The averaged qualities per association are: 


GunlAy AP S085, OnetAg Alo 06s, 
Qaver(A1, AP) ~ 0.73, Qaver(A1, AP) = 0.50. 


The global quality matrix is then given by (using the 
averaging approach) 


Bo 
1 
Q(A1, A2) = Be S> Q(Ai, AS?) 
2 ko=1 
0 O 0.74 O 
~|]0 O O 0.57 


The global quality indexes Qays(A1,Az) and 
Qaver(A1,A2) are then approximately equal to 1.8 and 
0.6 respectively. 


One can also improve the estimation of the quality matrix by 
using the absolute quality factor of each solution Q(Ay, AP? ), 
for kz = 1,... (2 to define the normalized weighting factors 
as follows: 

w= [Who , ke = iT, . .. Bo)’, 


k 
: abs(A1,A5? oes 
with wy, 4& Bavs(Ar Ag) and where the normalization factor 


K is given by K = ae Qabs(A1, AS”). In this example, 
we get the weights 


, 1.66 1.91 2.19 1.51, 
w= [wi we ws wal © [555 Tor Tar Taq! 


= [0.2283 0.2627 0.3012 0.2077]. 


The global quality matrix is then given by (using the 
averaging approach) 


Ba 
Q(Aj, Az) = », Wee Q(A1, AS?) 
ko=1 
0 O 0.76 O 
= j0 O 0 0.58 


0 049 0 0 


If we prefer to use the Belief Interval Measure (BIM) 
instead of the previous averaging approach, we will get in 
this example the following imprecise qualities values: 


Optimal. assignments 
1,3 


® [0.59, 0.89 
~ (0.41, 0.76] 
~ [0.35, 0.65] 


(2, 4) 
(3,2) 


Based on the comparisons of (pessimistic) lower bounds, 
or (optimistic) upper bounds of BIM, we observe that we get 


a consistent ordering of the qualities of the optimal solutions 
(same ordering as with the averaging method). 


III. SOLUTION OF THE 2ND PROBLEM (MULTI-CRITERIA) 


In this section, we evaluate the global DA association 
solution, with estimation of its quality, based on the 
knowledge of the qualities of multiple optimal DA solutions 
established separately based on distinct association criteria 
Cr, k = 1,...,K. More precisely, given the set of quality 
matrices Q* (k = 1,...,K) defined by the components 
q* (i,j) according to Eq.(11), how to establish the global 
optimal DA solution with its overall quality matrix Q? 
Moreover, we want to take into account the importance 
of each criteria (when defined) in the establishment of the 
solution. 


In fact this 2nd problem is linked to the previous one and 
the method developed for solving our first problem will also 
help to solve this second problem as it will be shown in the 
following. Our solution is based on four distinct steps: 

e Step 1: Estimation of the normalized weighting vector w of 
the criteria: Two simple approaches are proposed to establish 
the normalized criteria ranking (weighting) vector. 


1) Direct method: The weightings factors can be directly 
established either by an external source of information, 
or by the system designer. If these weightings factors 
are not available, we propose to compute them from 
the qualities indicators derived by the method used to 
solve the Ist problem (see the previous section). For 
example, if we consider K criteria providing quality 

factors Q*,,(Ai(Cx), A2(Cy)), k = 1,2,...,K, then 

we compute the normalized K x 1 weighting vector 

w = [wi w2...wx |’ with the k-th component given by 

k 
‘ig abs(Aa(Ck),Az(Ce)) 3) 
5-1 Qavs(A (Cj), A2(O5)) 

where Q*,,,(Ai(Cx),A2(Cx)) is the absolute quality 

factor obtained from the quality matrix Q*(A,, Az) of 

the optimal DA for the criteria Cy. 

Saaty’s method: This method is part of Saaty’s AHP 

method widely used for multi-criteria decision analysis 

in operational research [22], [23], [24], and it has been 
connected with information fusion and belief functions 
in [25], [26], [27]. The relative importance of one 
criterion over another must be expressed by the system 
designer using a pairwise K x K comparison matrix 

(also called knowledge matrix) M = [mpg] where 

the element m,, of the matrix defines the importance 

of criteria C;, with respect to the criteria Cy, with 

p,q € {1,2,...,K}. For example, see [25] for details, 

let’s consider only K = 3 criteria, if the comparison 

matrix is given by 


2 


wm 


(1/1) 
(3/1) 
(1/4) 


(1/3) 
(1/1) 
(1/5) 


(4/1) 
(5/1)] 
(1/1) 


M= 
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it means that the element m3 = 4/1 indicates that the 
criteria C; is four times as important as the criteria C3 
for the system designer (or decision-maker), etc. From 
this pairwise matrix, Saaty demonstrated that the ranking 
of the priorities of the criteria can be obtained from the 
normalized eigenvector, denoted w, associated with the 
principal/max eigenvalue of the matrix M, denoted X. 
In our example, one gets \ = 3.0857 and and w = 
[0.2797 0.6267 0.0936)’ which shows that C2 criterion is 
the most important criterion with the weight 0.6267, then 
the criterion C is the second most important criterion 
with weight 0.2797, and finally C3 criterion is the least 
important criterion with weight 0.0936. 


e Step 2: Combined estimation of the qualities of each target 
association 

Once the normalized weighting vector w of the criteria 
has been obtained, we need at first to compute the com- 
bined/weighted estimation of the qualities of each target 
association with the n available measurements. This is done 
by building the following n x K matrix 


Qi = [ai(Ci) ...ai(Ck)], 


where each column q;(C;,) of the matrix Q; corresponds to 
the transpose of the i-th row of the quality matrix Q* (Aj, A2). 

Then following AHP approach, we multiply this n x Kk 
matrix Q,; by the normalized criteria ranking kK x 1 vector 
w (obtained either from the direct method of Saaty’s one) to 
get the combined estimation of the qualities of each target 
association. More precisely, for the i-th target, we obtain the 
following n x 1 vector 


(14) 


qi = Qiw. (15) 


e Step 3: Search for the optimal global assignment based on 
combined qualities derived from the criteria. 

From the set of m vectors q; (i=1,2,...,m) we need to solve 
now a new optimal DA association problem with the (global) 
m xX n rewards matrix defined by 
(16) 


Qe= lay qo ...Gm)!, 


Murty’s algorithm is then used again here to get the optimal 
DA solution(s) providing the best global reward, and to 
generate also all the 2nd-best solutions that are necessary to 
estimate its quality in Step 4. 

e Step 4: Estimation of the quality of the optimal DA solution. 

We use the method described in Section 2 for solving the 
problem | to estimate the quality of the optimal DA solution. 
If several Ist-best DA solutions occur, we choose the solution 
generating the highest Qa», quality index. 


IV. A SIMPLE ILLUSTRATIVE EXAMPLE 


For the sake of simplicity, let’s consider the following 
example with m = 3 targets, n = 5 measurements, and 3 
criteria C',, C2 and C3 associated with the (randomly chosen) 
rewards matrices: 


100 20 33 5 27 
O(C,)=]11 80 25 37 62], 
38 2 24 78 46 


87 35 43 20 95 
O(Cz) = |28 83 25 10 29], 
1 -7 72 i 20 


25 78 49 60 9 
(C3) = |30 26 79 20 49 
20 20 3 47 81 


A. Qualities of optimal data associations 


Applying the method described in section 1, we easily 

obtain the following quality matrices of optimal DA solutions: 

e For criterion C), one gets 6; = 1 and {2 = 1, and the 
following Ist best and 2nd best DA solutions 


1 0 0 0 0 

A,=|0 1 0 0 O}, 
0 0 0 1 0 

(C1) > 

1 0 0 0 0 

A2={!0 0 0 0 1, 
0 0 0 1 0 

providing the Ist and 2nd best global rewards 


R(Q(C;), Ar) = 258 and R(Q(C;), Ao) = 240. Ap- 
plying the method described in Section 2, we obtain the 
following quality matrix related with the optimal DA 
based on criterion C}: 


082 0 0 0 0 
Qa! 0 052 0 0 v 
0 0 O 0.76 0 


e For criterion C2, one gets 6; = 1 and {2 = 1, and the 
following Ist best and 2nd best DA solutions 


0 0 0 0 1 

Ai=]0 1 0 0 Of, 
0 0 1 0 0 

(C2) > 

1 0 0 0 0 

A2={0 1 0 0 O|,7 
0 0 1 0 0 

providing the Ist and 2nd best global rewards 


R(Q(C2), Ar) = 250 and R(Q(C2), Ao) = 242. Ap- 
plying the method described in Section 2, we obtain the 
following quality matrix related with the optimal DA 
based on criterion C%: 


0 O 0 O 0.51 
Q’?x~|0 0.78 0 0 O 
0 O 0.74 0 O 


For criterion C3, one gets 6; = 1 and (2 = 1, and the 
following Ist best and 2nd best DA solutions 
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0 1 0 0 0 

A,=|0 0 1 0 O|, 
nes 0 0 0 0 1 
0 0 0 1 0 

Az,=!0 0 1 0 O}], 

000 0 1 


providing the Ist and 2nd best global rewards 
R(Q(C3), Ar) = 238 and R(Q(C3), As) = 220. Ap- 
plying the method described in Section 2, we obtain the 
following quality matrix related with the optimal DA 
based on criterion C3: 


0 053 0 O 0 
Q?>~}0 0 0.78 0 0 
0 0 0 O 0.79 


B. Multicriteria-based DA solution with its quality 


e Case 1: If we assume that all criteria have the same 


weights in the search of optimal DA solution, then 
we take the normalized weighting vector as w = 
[1/3 1/3 1/3]'. Therefore, the weighted average Qg = 

sey w,Q* of the quality matrices Q', Q? and Q? 
gives us the following rewards matrix 


0.27 0.17 O O 0.17 
NG} 0 0.43 0.26 0 0 
0 O 0.25 0.25 0.26 


Now we solve the DA association problem to maximize 
the global quality reward using Murty’s algorithm and we 
get the following Ist best and 2nd best DA solutions: 


1 0 0 0 0 
Ai=|0 1 0 0 0}, 
00 0 0 1 
Q=> 
1 0 0 0 0 
Axs=|0 1 0 0 O}, 
0 0 0 1 0 


with the Ist and 2nd best global rewards R(Q¢, Ai) © 
0.97 and R(Q¢,A2) & 0.96. Applying the method 
described in Section II to estimate the quality of this 
optimal DA solution, we obtain the following quality 
matrix: 


0.74 0 O 0 O 


Qz~; 0 084 0 0 O 
0 0 O 0 0.50 


Case 2: If we use the prior information given by abso- 
lute quality indicators to build the normalized weighting 
vector, we get 


m= yy Sone, 


i=1 j=1 


147 


obs = ye > Ori7) oo 2.04, 


i=1 j=1 
=> > Ce a2, 
i=1 j=1 
and we have Q?,, + Q2,, + Q?,, = 6.2672. So that, the 


normalized weights are given by 


2.1154 2.0426 2.1091 


/ 
6.2672 6.2672 6.2679! 
= [0.3375 0.3260 0.3365]. 


w = [w1 we ws)’ = [ 


The weighted average Qg = pea w,.Q* of the quality 
matrices Q', Q? and Q? give us now the following 
rewards matrix 


0.27 0.17 0 0 0.16 
NG} 0 043 0.26 0 0 
0 O 0.24 0.25 0.26 


Now we solve the DA association problem to maximize 
the global quality reward and we get the following Ist 
best and 2nd best DA solutions: 


[2 9 9 0 9 
Ai=|0 1 0 0 O}, 
a a or 

Q=> 
io 0 OO 
A,=|0 1 0 0 O|, 
Oe 1 a 


with the Ist and 2nd best global rewards R(Q¢, Ai) © 
0.97 and R(Qc¢,A2) & 0.96. Applying the method 
described in Section 2 to estimate the quality of this 
optimal DA solution, we obtain the following quality 
matrix: 


0.74 0 O 0 0 
Qr; 0 084 0 0 0 
0 0 O 0 0.50 


Because the normalized weights based on the absolute 
quality indicators, in this example, are all close to 1/3, 
we obtain the result of the multicriteria-based optimal DA 
and its quality close to what we get when assuming equi- 
importance of the criteria in the fusion process, which is 
normal. 


To qualify qualitatively the quality of the pairings in the 
optimal DA solution, we split the quality range [0;1] into three 
subintervals as follows® 


Low quality : if g(i,7) € [0; 1/3), 
if q(t, 9) € [1/3; 2/3], 


if g(t, 7) € [2/33 1]. 


Medium quality : 
High quality : 


8Of course, other repartitions could be used instead depending on the what 
would prefer the system designer. 
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Based on this qualitative scale, we finally get for our 
example the final multicriteria-based DA solution 


1 0 0 0 0 
A,=|0 1 0 0 O], 
0 0 0 0 1 
with the qualitative quality matrix 
— High 
Qiehante _ -_ High ae = ' 


Medium 


“8 


where the notation “—” means “that the quality evaluation 
does not apply”, or is interpreted (by default) as “the worst 
quality”. 


Remark: It is worth to note that this approach provides in 

general not the same results as if one would combine (and 

weight) directly the original reward matrices of each criterion. 

In this example, the weighted global reward matrix OQgirect = 
K 

ype WeQ(Ce) would be equal to 


69.07 45.39 41.75 29.31 40.85 
22.93 61.43 44.46 22.79 47.44 
23.13 9.98 30.78 55.75 53.53 


corresponding to the quality matrix of optimal DA solution 


0.73 0 O O 0 
Qairect y 0 0.84 0 0 0 
0 0 O 047 O 


One sees that these high quality solutions are fully consis- 
tent with the high quality solutions of our method. However, 
the medium quality solution (we get (3,4) pairing from the 
direct optimal assignment versus (3,5) assignment obtained by 
our method) mismatch. This reflects an ambiguity in the choice 
of the assignment of target T3. Therefore, such assignment is 
unreliable because of its low quality, and should not be used 
to update the track of this target. 


Quairect ~y 3 


V. CONCLUSION 


In this paper, we have proposed two methods based on belief 
functions for establishing: 1) the quality of pairings given 
by optimal data association (or assignment) solution using a 
chosen algorithm (typically Murty’s algorithm coupled with 
Auction algorithm) with respect to a given criterion, and 2) 
the quality of the multicriteria-based optimal data association 
solution. Our methods are independent of the choice of the 
algorithm used in finding the optimal assignment solution, and, 
in case of multiple optimal solutions, they provide also a way 
to select the best optimal assignment solution (the one having 
the highest absolute quality factor). The methods developed 
in this paper are general in the sense that they can be applied 
to different types of association problems corresponding to 
different sets of constraints. This method can be extended to 
SD-assignment problems as well. As perspectives, we would 
like to extend our approach to the n-D assignment context, 
and then evaluate its performances in a realistic multi-target 
tracking scenario. 


148 


REFERENCES 


[1] Y. Bar-Shalom, P. Willett, X. Tian, Tracking and Data Fusion: A 
Handbook of Algorithms, YBS Publishing, Storrs, CT, USA, 2011. 

[2] D.L. Hall, C.Y. Chong, J. Llinas, M. Liggins II, Distributed Data Fusion 
for Network-Centric Operations, CRC Press, 2013. 

[3] H.W. Kuhn, The hungarian method for the assignment problem, Naval 
Research Logistic Quarterly, Vol. 2, pp. 83-97, 1955. 

[4] J. Munkres, Algorithms for the assignment and transportation problems, 
SIAM J., Vol.5 (1), pp. 32-38, March 1957. 

[5] F. Bourgeois, J.C. Lassalle, An extension of the Munkres algorithm for 
the assignment problem to rectangular matrices, Comm. of the ACM, 
Vol. 14(12), pp. 802-804, Dec. 1971. 

[6] R. Jonker, A. Volgenant, A shortest augmenting path algorithm for 
dense and sparse linear assignment problems, Journal of Computing, 
Vol. 38(4), pp. 325-340, Dec. 1987. 

[7] D-.P. Bertsekas, The auction algorithm: A distributed relaxation method 
for the assignment problem, Ann. of Oper. Res., Vol. 14(1), pp. 105-123, 
Dec. 1988. 

[8] K.G. Murty, An algorithm for ranking all the assignments in order of 
increasing cost, Operations Research, Vol. 16(3), pp. 682-687, 1968. 

[9] C.R. Chegireddy, H.W. Hamacher, Algorithms for finding K-best perfect 
matching, Discrete Applied Mathematics, Vol. 18, pp. 155-165, 1987. 

10] R. Danchick, G.E. Newnam, A fast method for finding the exact N-best 
hypotheses for multitarget tracking, IEEE Trans. on AES, Vol. 29(2), 
pp. 555-560, April 1993. 

11] MLL. Miller, H.S. Stone, LJ. Cox, Optimizing Murty’s ranked assignment 
method, TEEE Trans. on AES, Vol. 33(3), pp. 851-862, July 1997. 

12] M. Pascoal, M.E. Captivo, J. Climaco, A note on a new variant of 
Murty’s ranking assignments algorithm, 40R Journal, Vol. 1, pp. 243- 
255, 2003. 

13] Z.J. Ding, D. Vandervies, A modified Murty’s algorithm for multiple 
hypothesis tracking, Proc. of SPIE Signal and Data Proc. of Small 
Targets, Vol. 6236, May 2006. 

14] E. Fortunato, et al., Generalized Murty’s algorithm with application to 
multiple hypothesis tracking, Proc. of Fusion 2007, pp. 1-8, Québec, 
July 9-12 2007. 

15] X. He, R. Tharmarasa, M. Pelletier, T. Kirubarajan, Accurate Murty’s Al- 
gorithm for Multitarget Top Hypothesis, Proc. of Fusion 2011, Chicago, 
IL, USA, July 5-8 2011. 

16] H.W. Hamacher, M. Queyranne, K-best solutions to combinatorial 
optimization problems, Ann. of Oper. Res., No. 4 , pp. 123-143, 1985/6. 

17] M. Dell’Amico, P. Toth, Algorithms and codes for dense assignment 
problems: the state of the art, Discrete Appl. Math., Vol. 100, pp. 17— 
48, 2000. 

18] G. Shafer, A mathematical Theory of Evidence, Princeton Univ. Press, 
Princeton, NJ, USA, 1976. 

19] F Smarandache, J. Dezert, Advances and applications of DSmT for 
information fusion, Volumes 1-4, American Research Press, 2004-2015. 
http://www.gallup.unm.edu/~ smarandache/DSmT.htm 

20] P. Smets, R. Kennes, The transferable belief model, Artificial Intelli- 
gence, Vol. 66(2), pp. 191-234, 1994. 

21] J. Dezert, K. Benameur, On the Quality of Optimal Assignment for Data 
Association, Proc. of Belief 2014, Oxford, UK, Sept. 26-28, 2014. 

22] T.L. Saaty, A scaling method for priorities in hierarchical structures, J. 
of Math. Psych., Vol. 15, pp. 59-62, 1977. 

23] T.L. Saaty, The Analytical Hierarchy Process, McGraw Hill, 1980. 

24] E.H. Forman, S.I. Gass, The analytical hierarchy process: an exposition, 
Oper. Res., Vol. 49(4), pp. 469-487, 2001. 

25] J. Dezert, J.-M. Tacnet, M. Batton-Hubert, F. Smarandache, Multi- 
criteria decision making based on DSmT/AHP, Proc. of Int. Workshop 
on Belief Functions, Brest, France, April 2-4, 2010. 

26] J. Dezert, J.-M. Tacnet, Evidential Reasoning for Multi-Criteria Analysis 
based on DSmT-AHP, Proc. of ISAHP 2011, Italy, June 2011. 

27] J.-M. Tacnet, J. Dezert, M. Batton-Hubert, AHP and Uncertainty The- 
ories for Decision Making using the ER-MCDA Methodology, Proc. of 
ISAHP 2011, Italy, June 2011. 

28] F. Smarandache, J. Dezert, On the consistency of PCR6 with the 
averaging rule and its application to probability estimation, Proc. of 
Fusion 2013 Conf., Istanbul, Turkey, July 9-12, 2013. 

29] J. Dezert, A. Tchamova, On the validity of Dempster’s fusion rule and 
its interpretation as a generalization of Bayesian fusion rule, Int. J. of 
Intelligent Syst., Special Issue: Advances in Intelligent Systems, Vol. 29, 
No. 3, pp.223-252, March 2014. 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Environment Perception Using Grid Occupancy 
Estimation with Belief Functions 


Jean Dezert, Julien Moras, Benjamin Pannetier 
The French Aerospace Lab 
ONERA/DTIM/EVF 
F-91761 Palaiseau, France. 
Emails: jean.dezert@onera.fr, julien.moras@onera.fr, benjamin.pannetier @onera.fr 


Originally published as: J. Dezert, J. Moras, B. Pannetier, Environment Perception Using Grid Occupancy 
Estimation with Belief Functions, in Proc. of Fusion 2015, Washington D.C, USA, July 6-9, 2015, and 


reprinted with permission. 


Abstract—Grid map offers a useful representation of the per- 
ceived world for mobile robotics navigation. It will play a major 
role for the security (obstacle avoidance) of next generations of 
terrestrial vehicles, as well as for future autonomous navigation 
systems. In a grid map, the occupancy of each cell representing 
a small piece of the surrounding area of the vehicle must 
be estimated at first from sensors measurements, and then it 
must also be classified into different classes in order to get a 
complete and precise perception of the dynamic environment 
where the vehicle moves. So far, the estimation and the grid 
map updating have been done using fusion techniques based on 
the probabilistic framework, or on the classical belief function 
framework thanks to an inverse model of the sensors and 
Dempster-Shafer rule of combination. Recently we have shown 
that PCR6 rule (Proportional Conflict Redistribution rule #6) 
proposed in DSmT (Dezert-Smarandache) Theory did improve 
substantially the quality of grid map with respect to other 
techniques, specially when the quality of available information is 
low, and when the sources of information appear as conflicting. In 
this paper, we go further and we analyze the performance of the 
improved version of PCR6 with Zhang’s degree of intersection. 
We will show through different realistic scenarios (based on a 
4-layers LIDAR sensor) the benefit of using this new rule of 
combination in a practical application. 

Keywords: Information fusion, grid map, cell occupancy, 


perception, belief functions, DSmT, PCR6, ZPCR6. 


I. INTRODUCTION 


Occupancy Grids (OG) are often used for intelligent vehicle 
environment perception and navigation, which requires tech- 
niques for data fusion, localization and obstacle avoidance. As 
OGs manage a representation of the environment that does 
not make any assumption on the geometrical shape of the 
detected elements, they provide a general framework to deal 
with complex perception conditions. In our previous works, we 
did focus on the use of a multi-echo and multi-layer LIDAR 
system in order to characterize the dynamic surrounding 
environment of a vehicle driving in common traffic conditions. 
The perception strategy involved map estimation and scan 
grids [1], [2] based either on the classical bayesian framework, 
or on classical evidential framework based on Dempster-Shafer 
theory (DST) [3] of belief functions. The map grid acts as a 
filter that accumulate information and allows to detect moving 
objects. A comparative analysis of performances of these 
approaches has already been published recently in [4]. 

In dynamic environments, it is crucial to have a good 
modeling of the information flow in the data fusion process 
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in order to avoid adding wrong implicit prior knowledge that 
will need time to be forgotten. In this context, evidential OG 
are particularly interesting to make a good management of 
the information since it is possible to explicitly make the 
distinction between non explored and moving cells. 

The idea of using the probabilistic framework to estimate the 
grid occupancy has been popularized by Elfes in his pioneered 
works in 1990’s [8]. Later, the idea has been extended with 
the fuzzy logic theory framework by Oriolo et al. [10], and in 
parallel with the belief function (evidential) framework as well 
{11]-[15]. Most of the aforementioned research works dealt 
only with acoustic sensors (i.e SONAR). Recently, DSmT has 
also been applied for the perception of the environment with 
acoustic sensors as reported in [16]-[18]. 

The aim of this paper is to analyze the performance of 
the improved version of PCR6 taking into account Zhang’s 
degree of intersection of focal elements (called ZPCR6 rule) 
which has been presented in details in the companion paper 
[7] in a realistic perception problem using a 4-layers LIDAR 
sensor. We show how the environment perception with non 
acoustic sensors can be done, and compare the performances 
of different fusion rules (Bayesian, Dempster-Shafer, PCR6 
and ZPCR6) in terms of accuracy of grid map estimation. 

This paper is organized as follows. After a short presentation 
of the basics of belief functions and rules of their combination 
based on DST and DSmT in the next section, we will present 
the inverse sensor models in section III with the construction of 
the basic belief assignments (BBA). In section IV, we present 
an illustrating scenario for environment perception including 
a mobile object with a platform equipped with a LIDAR, and 
we compare our new realistic simulation results with those 
obtained by the probabilistic and the classical belief-based 
approaches. We will show how static and mobile objects are 
extracted from the occupancy grid map using digital image 
processing. Finally, conclusion and outline perspectives are 
given in section V. 


II. BASICS OF BELIEF FUNCTIONS AND THEIR FUSION 


Dempster-Shafer’s theory (DST) of evidence has been de- 
veloped by Shafer in 1976 from Dempster’s works [3] . DST is 
known also as the theory of belief functions and it is mainly 
characterized by a frame of discernment (FoD), sources of 
evidence represented by basic belief assignment (BBA), belief 
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(Bel) and plausibility (Pl) functions, and Dempster’s rule and 
denoted DS rule in the sequel! of combination. DST has been 
modified and extended into Dezert-Smarandache theory [6] 
(DSmT) to work with quantitative or qualitative BBA and to 
combine the sources of evidence in a more efficient way thanks 
to new proportional conflict redistribution (PCR) fusion rules — 
see [19]-[22] for discussion and examples. We briefly recall in 
the next subsections the basics of the theory of belief functions 


A. Belief functions 


Let consider a finite discrete FoD Q = {w1,we,...,wn}, 
with n > 1, of the fusion problem under consideration and 
its fusion space G which can be chosen either as the power- 
set 2°, the hyper-power set” D®, or the super-power set S® 
depending on the model that fits with the problem [6]. A 
BBA associated with a given source of evidence is defined 
as the mapping m(.) : G? — [0,1] satisfying m(0) = 0 
and S) yege m(A) = 1. The quantity m(A) is called mass of 
belief of A committed by the source of evidence. Belief and 
plausibility functions are defined by 


Bel(A) = S~ m(B) and PI(A)= S> m(B) (1) 
BCA BnA#0 
Bec? BeG? 


The degree of belief Bel(A) given to a subset A quantifies the 
amount of justified specific support to be given to A, and the 
degree of plausibility Pl1(A) quantifies the maximum amount 
of potential specific support that could be given to A. If for 
some A € G®, m(A) > 0 then A is called a focal element 
of the BBA m(.). When all focal elements are singletons and 
G® = 2° then the BBA m/(.) is called a Bayesian BBA [3] 
and its corresponding belief function Bel(.) is homogeneous 
to a (possibly subjective) probability measure, and one has 
Bel(A) = P(A) = PI(A), otherwise in general one has 
Bel(A) < P(A) < PI(A), VA € G®. The vacuous BBA 
representing a totally ignorant source is defined as m,(Q) = 1. 


B. Fusion rules 


Many rules have been proposed in the literature in the past 
decades (see [6], Vol. 2 for a detailed list of fusion rules) 
to combine efficiently several distinct sources of evidence 
represented by the BBA’s mi(.), mo(.), ..., ms(.) (s > 2) 
defined on same fusion space G’. In this paper, we focus 
only on DS rule because it has been historically proposed in 
DST and it is still widely used in applications, and on the PCR 
rule no. 6 (i.e. PCR6) proposed in DSmT because it provides 
a very interesting alternative of DS rule, even if PCR6 is more 
complex to implement in general than DS rule. 

In DST framework, the fusion space G® equals the power- 
set 2° because Shafer’s model of the frame 2 is assumed, 
which means that all elements of the FoD are exhaustive and 


'DS acronym standing for Dempster-Shafer since Dempster’s rule has been 
widely promoted by Shafer in the development of his mathematical theory of 
evidence [3] . 

2which corresponds to a Dedekind’s lattice, see [6] Vol. 1. 


exclusive. The combination of the BBA’s ™m (.) and ma(.), is 
done by : mP3 (0) = 0 and for all X 4 @ in 2° 


2 
[ux ©@ 
X1,X2E2% i=1 
X{NXe=X 


a 1 = m1,2(0) 


where the numerator of (2) is the mass of belief on the 
conjunctive consensus on X. The denominator 1 — m1,2(Q) is 
a normalization constant. The total degree of conflict between 
the two sources of evidences is classically defined by 


According to Shafer [3], the two sources are said in total 
conflict if ™m,2(@) = 1. In this case the combination of the 
sources by DS rule cannot be done because of the mathe- 
matical 0/0 indeterminacy. The vacuous BBA m,(Q) = 1 
is a neutral element for DS rule. This rule is commutative 
and associative, and the formula (2) can be easily generalized 
for the combination of s > 2 sources of evidences. DS rule 
remains the milestone fusion rule of DST. 

The doubts of the validity of DS rule has been discussed 
by Zadeh in 1979 [28]-[30] based on a very simple example 
with two highly conflicting sources of evidences. Since 1980’s, 
many criticisms have been done about the behavior and the 
justification of such DS rule. More recently, Dezert et al. in 
[19], [20] have put in light other problematic behaviors of DS 
rule even in low conflicting cases and showed serious flaws in 
logical foundations of DST [21]. To overcome the limitations 
and problems of DS rule of combination, a new family of PCR 
rules have been developed in DSmT framework. We present 
the most elaborate one, i.e. the PCR6 fusion rule, which has 
been used in our perception application for grid occupancy 
estimation. 

In PCR rules, instead of following the DS normalization 
(the division by 1—™ ,2(0)), we transfer the conflicting mass 
only to the elements involved in the conflict and proportionally 
to their individual masses, so that the specificity of the 
information is entirely preserved. The general principle of PCR 
consists: 1) to apply the conjunctive rule, 2) to calculate the 
total or partial conflicting masses; 3) then redistribute the (total 
or partial) conflicting mass proportionally on non-empty sets 
according to the integrity constraints one has for the frame 
Q. Because the proportional transfer can be done in different 
ways, there exist several versions of PCR rules of combination. 
PCR6 fusion rule has been proposed by Martin and Osswald in 
[6] Vol. 2, Chap. 2, as a serious alternative to PCRS fusion rule 
proposed originally by Smarandache and Dezert in [6] Vol. 2, 
Chap. 1. Martin and Osswald had proposed PCR6 based on 
intuitive considerations and they had shown through different 
simulations that PCR6 was more stable than PCRS in term 
of decision for combining s > 2 sources of evidence. When 
only two sources are combined, PCR6 and PCRS fusion rules 
coincide, but they differ as soon as more than two sources 
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have to be combined altogether. Recently, it has been proved 
in [22] that only PCR6 rule is consistent with the averaging 
fusion rule which allows to estimate the empirical (frequentist) 
probabilities involved in a discrete random experiment. For 
Shafer’s model of FoD?, PCR6 fusion of two BBA’s m(.) 
and mz(.) is defined by m['$"°() = 0 and for all X 4 0 in 
92 


mes °(X)= > mi(X1)ma(X2) 
X1,X2€2° 
X1NX2g=X 
ma (X)?ma(¥) ma(X)*mi(Y) 
+o eee oe 
BSNS m(X)+m(¥Y)  meo(X)+mi(Y) 


where all denominators in (4) are different from zero. If a 
denominator is zero, that fraction is discarded. All proposi- 
tions/sets are in a canonical form [6]. Very basic Matlab codes 
of PCR rules can be found in [6], [23] and from the toolboxes 
repository on the web [27]. Like the averaging fusion rule, 
the PCR6 fusion rule is commutative but not associative. The 
vacuous belief assignment is a neutral element for this rule. 
The PCR6 rule of combination (as well as DS rule) use only 
part of the whole information available (i.e. the values of the 
masses of belief only), and they don’t exploit the cardinalities 
of focal elements entering in the fusion process. Because the 
cardinalities of focal elements are fully taken into account 
in the computation of the measure of degree of intersection 
between sets, we have recently proposed to improve PCR6 
rules using this measure in the companion paper [7]. The basic 
idea is to replace any conjunctive product by its discounted 
version thanks to the measure of degree of intersection D when 
the intersection of focal elements is not empty. The product 
of partial (or total) conflicting masses are not discounted by 
the measure of degree of intersection because the degree of 
intersection between two (or more) conflicting focal elements 
always equals zero, that is if X NY = 0, then D(X,Y) = 0. 
In [7], we have shown in different examples why Zhang’s 
degree of intersection [31], denoted D7(X,,..., Xs), is more 


interesting than classical Jaccard’s degree. D7(X1,..., Xs) is 
mathematically defined by 
Xi NX2N...N Xs 
6 Renee aye pea | (5) 


[Xa] [Xo] +... [Xs| 


where |X, X2M...MXs| is the cardinality of the inter- 
section of the focal elements X1, X9,..., Xs, and |Xj], |Xol, 
... |X| their cardinalities. The improved version of PCR6 with 
Zhang’s degree of intersection (called ZPCR6 rule) is easy to 
get and it corresponds to the following formula* 


3that is when G2 = 2%, and assuming all elements exhaustive and 
exclusive. 
4The general ZPCR6 formula for s > 2 sources in detailed in [7]. 
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~ TZPCR6- 
KY 


> 


X1,X2€2° 
X{NX2e=X 


m1(X)?m2(Y) 
Dies) etna) 


D?(X1,X2)m1 (X1)m2(X2) 


ee Ue 
xX Y 
Ye2r\{x} m2(X) +mi(Y) 
XnY=0 
where KP CR6 is a normalization constant such that 


Dxeon MEZCRO(X) = 1. As for PCR6, one has 


ciative. The advantage of ZPCR6 over PCR6 and DS rules is 
its ability to respond to the inputs in a more effective way 
has clearly shown in very interesting examples detailed in [7]. 
Due to space limitation, these examples will not be presented 
and discussed here again. 


C. Discounting 


A discounting effect can be applied on a mass function m/(.) if 
a piece of information has its reliability lowered. In this case, 
a new mass function mq(.) (with a € [0,1]) is computed 
from m/(.) and a part of the mass of each element of the FoD 
is transferred to the whole FoD 2 which represents the total 
ignorance. 


aye fim (A) 
aA) Nie eee 


ifA#O 


ifA=O 7) 


III. EVIDENTIAL OCCUPANCY GRID 


The basic idea of an Occupancy Grid (OG) is to divide the 
surrounding environment (the ground plane of 2D world) into 
a set a cells (denoted C’, i € [0, n]) in order to estimate their 
occupancy state. In a probabilistic framework, the aim is to 
estimate the probabilities P (O"|z1.1) and P (F"|z1.4) given a 
set of measures z1.; from the beginning up to the current time 
t. O* (resp. F") denotes the occupied (resp. free) state of the 
cell C’. Finally, a decision rule is applied in order to select 
the most likely state for each cell. 

For Evidential approach, occupancy grid represents the in- 
formation using a mass function over the frame of discernment 
(FoD) 2 = {F, O}. So the mass functions used in grid have 
the structure 


me =[ mi (0) me(F) me(O) me (Q) | (8) 


The occupancy mass function can be used during the fu- 
sion process, then the decision can be taken using pignistic 
transform [26] to get a probability measure and use the same 
decision rule. An interesting part of evidential occupancy grid 
is that the FoD can be more complex, and as the fusion is 
done cell by cell the fusion scheme will be still valid. 
Occupancy grids can be classified into two categories de- 
pending on the use of a forward or inverse sensor model. The 
forward model relies on Bayes inference. Since this approach 
takes into account the conditional dependency of the cells of 
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the map, it is well adapted to a sensor that observes a large 
domain of cells with only one reading measurement (e.g. an 
ultrasonic sonar). However, it requires heavy processing that 
can be handled by optimized approximation. 

The inverse model approach is well adapted to narrow fields 
of measure by sensors (e.g. LIDAR). It is composed of two 
separate steps. First, a snapshot map of the sensor reading is 
built using an inverse sensor model P (O*|z). This model 
can take into account the conditional dependency between the 
sensor reading and the occupancy of the seen cells. Then, a 
fusion process (denoted ©) is done with the previous map 
P (O*|z1:2-1) as an independent opinion poll fusion: 


P(O'lz1:2) = P (O*|z) © P (O'|z14-1) (9) 


In the probabilistic framework, the usual fusion operation 
between states A and B coming from independent measure- 
ment, use independent opinion poll [34] : 

P(A)P(B) 
P(A)P(B) + (1— P(A))(1 — P(B)) 

Inverse approaches have very efficient implementations (e.g. 
log-odd) that make them popular in mobile robotics [8], [9], 
[25]. Maps built using inverse models are usually less accurate, 
since they just take into account the dependency of the cells 
observed in one reading, but it is a good approximation 
with accurate and high resolution sensors observing a limited 
number of cells at a time. Moreover, when the sensor is multi- 
echo and multi-layer, the conditional dependency of the seen 
cells can be modeled in an efficient way. 


P(A) © P(B) = (10) 


A. Fusion strategy with the inverse model 


When dealing with the inverse model approach, an estimate 
of the pose of the robot has to be available, and a map 
grid G™ has to be handled. This grid is defined in a world- 
referenced frame (so it does not move with the robot) and it 
is updated when a new sensor reading is available. Because of 
the likely evolution of the world in a dynamic environment, 
the OG update has to be completed by a remanence strategy. 
The fusion architecture is based on a prediction-correction 
paradigm to fuse one or several sensors observations. 

a) Prediction step: The prediction step computes the 
predicted map grid at time ¢ from the map grid estimated at 
time t — 1. Depending on the available information, this step 
can be very refined as done in [24]. Because we don’t have 
specific information on the velocity of the objects (or cells), the 
prediction step is done by the classical discounting technique. 
The confidence in past data is controlled by a remanence factor 
a € [0; 1]. The prediction stage is therefore governed by 


(11) 


b) Correction step: The correction step consists in the 
combination of the previously estimated map grid with the 
grid built from the current measures thanks to the inverse 
model sensor (see more details in [1], [2]). This one is called 
ScanGrid G?. As this information is referenced in the sensor 
frame, a 2D warping is applied to reshape this grid into the 


qu = discount (Gis, a) 


fusion frame. To perform this operation, the current pose q 
is estimated using a GPS sensor and the rigid homogeneous 
transformation matrix H, is computed. When GPS becomes 
unavailable, the CAN (Controller Area Network) bus is used 
to get the robot odometric data. The motion matrix H; and the 
extrinsic calibration matrix C’ are used to compute a remapping 
function f(a.y) according to Eq. (12) below 


x 


f(t,y)=C-H- | y 
1 


(12) 


Finally, the ScanGrid is remapped with f and fused with the 
previous map grid according to the general formula 
where the grid G? represents the BBA produced by the 
sensor model. This BBA is created in respect to sensor data 
(e.g. LIDAR point here) and a sensor model to infer an 
instant occupancy grid. With the probabilistic approach, it 
refers to the occupancy probability P,° (O). With the evi- 
dential approach it refers to the occupancy mass function 
mp = [ mp (0) me (F) m?(O) mF (Q) ]. The grid 
G® refers to the previous MapGrid G@ , predicted at current 
time using Eq. (11). In the next parts, for each approach 
considered, the fusion rule © used in Eq. (13) is different. 
Bayesian approach uses Eq. (10), DS approach uses Eq. 2, 
PCR6 approach uses Eq. (4), and ZPCR6 approach Eq. (6). 


B. Discounting in Occupancy Grids 


The main advantage of using discounting is to provide a 
simple way to model the presence of dynamic object in the 
scene. This model allows to make a prediction without infor- 
mation on the dynamic at the cell level (or at the object level) 
which is generally not directly available from sensors, and 
merely difficult to estimate without greedy time-computing 
algorithms [24] (especially when the evidential framework is 
adopted). The main issue with the discounting effect is that 
it makes impossible to build persistent static map. Indeed, 
cells not viewed by the sensor will quickly converge to the 
ignorance state, so this strategy cannot be used to build the 
map of a building for instance. If we are interested to build 
static map in presence of moving objects, the discounting 
function is then not recommended. We will see why in the 
next part of the paper where in this case Bayesian and DS 
fusion rules will not be very efficient. To handle this case, it 
is recommended to use either PCR6 or ZPCR6 rules. 


IV. SIMULATION RESULTS 


In this section, we present simulation results of grid oc- 
cupancy estimation in a realistic scenario based on different 
rules of combination (Bayesian fusion, Dempster-Shafer rule, 
PCR6 and ZPCR6 fusion rules). 
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A. Basic simulation 


Setup: In order to present the basic behavior of the different 
combination rules studied, we have realized at first some 
simple 1D-simulations, where we consider a grid cell crossed 
by a moving object. In this case, the state of the cell changes 
from free-state to occupied-state at time ¢, and from occupied- 
state to free-state at time tz. The figures 1-4 show the results 
of these simulations under different conditions. 

On each subfigure, we show on the top raw the real state 
of the cell (i.e. the ground-truth). The second raw shows the 
sensor data simulated that correspond to the BBA of the state 
of the cell. This mass function is built according to the state 
of the cell, the level of confidence of the sensor and can be 
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Fig. 2. Case without discounting (a = 0). 


eventually perturbed with additional noises. FA indicates the 
rate of False Alarms and ND the rate of Non Detections. We 
will consider different level of confidence for msg(O) when 
the cell is occupied and mgc(F’) when the cell if free. The 
subfigures at the bottom represent the level of belief of the 
cell state obtained with Bayesian fusion, Dempster-Shafer 
(DS) fusion, PCR6 and ZPCR6 fusion rules. 


Effect of discounting: Fig.1 shows the results of the classical 
chain using a discounting factor a = 0.05 while Fig. 2 is the 
same case without discounting (a = 0)). With discounting, 
all the fusion rules behave similarly. Without discounting, a 
lag appears with Bayesian and DS fusion rules. The lag is 
seriously reduced with PCR6 and ZPCR6. 
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Performances analyses: To evaluate the performance of our 
method, we did perform 10000 Monte Carlo runs for each 
simulation in order to estimate the false alarm and non 
detection rates. In order to make the decision, the pignistic 
probability has been computed and a MAP estimator has been 
used. Each simulation (from No 0 to No 8) correspond to 
different conditions (the discounting level, the rate of noise 
impacting sensor observations) reported in Table I. 


[| ~NDJFA_[ msc ©) /msa®) | 
fo [os [0 | owe | 
a 
3s to sd 
005 
ss 0.870.68 
A 
A 
[sa 
TABLE I 


PARAMETERS OF SIMULATIONS. 


For each simulation presented in the Table I, we obtain the 
performances (rates of FA and ND in %) shown in Table II. 
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TABLE II 
RATES OF FALSE ALARM AND NON DETECTION (IN %). 


Simulations 0 and 1 illustrated by Figures 1 and 2, corre- 
spond to the noise-free situation. By removing the discounting 
operator, Bayesian and DS approach have a lag in the detection 
of the change of state that impacts clearly their performances. 
The PCR6 and ZPCR6 approach are not concerned by this 
effect because of PCR of conflict. Simulations 2 and 3 (see Fig. 
3) include 10% of wrong measurement caused by noises. The 
fusion rules behave similarly as for simulations 0 and 1, but the 
performances are a bit lower which shows the effect of noisy 
measurements in the estimation process. For simulations 4, 5 
and 6, the noise reaches 15% for ND and 30% for FA which 
is important. As we see in Fig. 4, the Bayesian and DS fusion 
rules are not able to detect the second state change, during the 
simulation time. This induces the bad false alarm rates. In the 
last simulations 7 and 8 the noise is very important (about 25% 
of ND and 50% of FA). In these conditions, all the methods 
have poor false alarm rates but the PCR6 and ZPCR6 keep 
good non detection rates. Globally, we see an improvement 
of the performances when using ZPCR6, specially for the 
reduction of the FA rates. 


B. LIDAR simulation 


In this simulation, the DS and PCR6 fusion rules are 
compared on a 2D occupancy grid problem close to real 
application for robot perception. The simulation was realized 
using the Robot Operating System (ROS) [32] environment 
and the Gazebo [33] simulator is used here to simulate a 
Hokuyo LIDAR and a moving object as shown on Figure 5. 
The simulated sensor has a FoV (Field of View) about 270° 
and a max range about 10m. The rate of the scan is 20Hz and 
the ranges of the LIDAR point is corrupted with a Gaussian 
noise NV (0, 0.1). 
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Fig. 5. Gazebo simulation: the box turns around the LIDAR sensor. 


Figure 6 shows a simulated scan. The beams that do not 
hit obstacle within the range are considered as max range 
(as done in the real Hokuyo sensor). The moving object is 
a box which has a circular trajectory and moves at 6 rpm 
around the LIDAR. A ground true grid is computed according 
the real position of the box and its geometry at each scan 
time. The grid used is a square of 10 m by 10 m with 
a resolution of 0.1 m and the ScanGrid BBA are set to 
msq(O) = 0.8, mgq(Q) = 0.2 for occupied cells and 
mse (F) = 0.6, msg (Q) = 0.4 for free cells. 


Fig. 6. Bird view of one LIDAR scan. 


In order to quantify the results, we compute some metrics. 
However, because of occlusion, only the cells located on the 
edges of the box can be considered, that is why we don’t 
consider global metrics. We consider here the two following 
metrics: 1) the number of correct occupied cell (proportional to 
recall in our case), and 2) the number of conflicting cells close 
to the box. The first describes the ability of the method to add 
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objects into the map, and also by analogy to remove object 
from the map. The second describes the ability of the method 
to detect moving objects by generating conflict, this ability is 
important and is one of the improvement of the evidential grid 
with respect to the classical Bayesian grid. 

Figures 7 shows the result over one turn. The number of 
cells detected for both metrics depends a lot on the position 
around the sensor. This can be explained because, in some 
place, the LIDAR sensor is able to see two edges of the box, 
in other situations the LIDAR sensor detects just one edge, 
and when the box is behind (on the back of) the sensor it is 
out of the field of view of the LIDAR. On figure 7, we can 
see that the number of occupied cell with the ZPCR6 is more 
than those with PCR6 and DS fusion rules. Contrarily to the 
ZPCR6 and PCR6, the DS fusion without discounting can not 
handle well the quick change of states in the map. The x-axis 
of Figures 7 is the time stamp of the LIDAR scans, and the 
y-axis is the number of occupied cells. 


oO 1000 2000 3000 “4000 000 


Fig. 7. 2D LIDAR Simulation: Number of correct occupied cells (blue=DS 
rule, green=PCR6 rule, red=ZPCR6 rule). 


C. Real data processing 


A real experimentation was realized using an Hokuyo 
UTM-30LX sensor. This experimentation takes place in an 
office in which a person was walking into. The evidential 
occupancy grid fusion node was implemented within the ROS 
environment. The grid has the same size and resolution as in 
the previous example. The BBA used in the sensor model has 
been set to msg (O) = 0.8, msg (Q) = 0.2 for occupied 
cells and msg (F’) = 0.8, msg (Q) = 0.2 for free cells. No 
discounting was applied. 

Figures 8-10 present the occupancy grid estimation using 
DS, PCR6 and ZPCR6 rules for a typical snapshot of the se- 
quence. The color of cells denotes the state having the highest 
mass value: green for F' (free state), red for O (occupied state), 
and black for Q (full uncertainty). For convenience, we have 
also displayed in blue all the cells that carry a conflicting mass 
m(@) > 0 before applying the normalization step of DS rule, 


or before applying the proportional conflict redistribution with 
PCR6, or both with ZPCR6. 


Fig. 8. Snapshot 1 - with DS fusion. 


Fig. 9. Snapshot | - with PCR6 fusion. 


Fig. 10. Snapshot 1 - with ZPCR6 fusion. 


Figure 8 shows the result using DS rule. The room scanned 
by the sensor is correctly mapped and its bounds (mainly walls 
and doors) are clearly identified by the red pixels. The free 
space (green pixels) is correctly detected in the room except 
near the people that is labeled as free (with conflicting cell 
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shown in blue for convenience). The people moving around 
the desk in the office room is only detected from conflicting 
cells when he stops to walk several times. Figure 9 shows 
the PCR6 result at the same time stamps. In this case, the 
people is rightly detected as shown by the red pixels (occupied 
cells) inside the green area (the office room). A conflict cell 
is created when he starts walking in the room. The static part 
of the room is also detected (as with DS fusion rule). Figure 
10 shows the ZPCR6 result at the same time stamps. The 
results of this rule are close to the PCR6 rule but the level 
of ignorance (the mass on (2) is higher on the cells behind 
the person. This can be understood because the mass m (Q) 
is weighed by a factor 0.5 during the transition. 


V. CONCLUSIONS AND PERSPECTIVES 


In this work we have presented a novel application of the 
belief functions which significantly improves the map building 
process for intelligent vehicles environment perception and 
grid map estimation. This work shows the importance of 
defining an accurate sensor model. We have considered the un- 
certainties of the LIDAR measurements and used the ZPCR6 
rule of DSmT to model and combine sensor information. Our 
new method differs of Bayesian approach by allowing support 
for more than one proposition at a time, rather than a single 
hypothesis. It is a interval-based approach, as defined by the 
lower and upper probability bounds [Bel, Pl] allowing the lack 
of measurement to be modeled adequately. This new method 
based on ZPCR6 rule differs from the classical evidential 
approach based on DS rule and improves in theory the results 
based PCR6, and more substantially the results of DS rule. Our 
experimental results with the LIDAR confirm the improvement 
of the accuracy of this new grid estimation method w.rt 
previous methods, but the improvment obtained with ZPCR6 
over PCR6 is not so important because of the too simplistic 
structure of the chosen frame of discernment. As research 
perspectives, we will try to implement these fusion rules in 
3D occupancy grid (Octomap based) and use a stereo camera 
with dense disparity map computation as sensor source. Also 
we would like to deal with refined frames of discernment to 
ameliorate the precision of the perception and to emphasize 
the advantages of ZPCR6 rule. 
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Abstract—In this contribution, we propose to improve the grid 
map occupancy estimation method developed so far based on 
belief function modeling and the classical Dempster’s rule of com- 
bination. Grid map offers a useful representation of the perceived 
world for mobile robotics navigation. It will play a major role for 
the security (obstacle avoidance) of next generations of terrestrial 
vehicles, as well as for future autonomous navigation systems. In 
a grid map, the occupancy of each cell representing a small 
piece of the surrounding area of the robot must be estimated at 
first from sensors measurements (typically LIDAR, or camera), 
and then it must also be classified into different classes in 
order to get a complete and precise perception of the dynamic 
environment where the robot moves. So far, the estimation and 
the grid map updating have been done using fusion techniques 
based on the probabilistic framework, or on the classical belief 
function framework thanks to an inverse model of the sensors. 
Mainly because the latter offers an interesting management of 
uncertainties when the quality of available information is low, and 
when the sources of information appear as conflicting. To improve 
the performances of the grid map estimation, we propose in this 
paper to replace Dempster’s rule of combination by the PCR6 
rule (Proportional Conflict Redistribution rule #6) proposed in 
DSmT (Dezert-Smarandache) Theory. As an illustrating scenario, 
we consider a platform moving in dynamic area and we compare 
our new realistic simulation results (based on a LIDAR sensor) 
with those obtained by the probabilistic and the classical belief- 
based approaches. 


Keywords: Grid map, cell occupancy, perception, belief func- 
tions, DSmT, PCR6, robotics. 


I. INTRODUCTION 


Occupancy Grids (OG) are often used for robot environment 
perception and navigation, which requires techniques for data 
fusion [1], localization [2] and obstacle avoidance [3]. As OGs 
manage a representation of the environment that does not make 
any assumption on the geometrical shape of the detected ele- 
ments, they provide a general framework to deal with complex 
perception conditions. In our previous works, we did focus 
on the use of a multi-echo and multi-layer LIDAR system 
in order to characterize the dynamic surrounding environment 
of a robot navigating in an unrestricted area. The perception 
strategy involved map estimation and scan grids [4], [5] based 
either on the classical Bayesian framework, or on classical 
evidential framework based on Dempster-Shafer theory (DST) 
[6] of belief functions. The map grid acts as a filter that 
accumulate information and allows to detect moving objects. 


A comparative analysis of performances of these approaches 
has already been published recently in [7]. 

In dynamic environments, it is crucial to have a good 
modeling of the information flow in the data fusion process 
in order to avoid adding wrong implicit prior knowledge that 
will need time to be forgotten. In this context, evidential OG 
are particularly interesting to make a good management of 
the information since it is possible to explicitly make the 
distinction between non explored and moving cells. In this 
paper, we explore the use of Dezert-Smarandache Theory [8] 
(DSmT) as an alternative approach of the classical DST to 
provide better accurate estimation of the grid map occupancy 
for robot perception. 

The idea of using the probabilistic framework to estimate the 
grid occupancy has been popularized by Elfes in his pioneered 
works in 1990’s [9]-[13]. Later, the idea has been extended 
with the fuzzy logic theory framework by Oriolo et al. [15]- 
[21], and in parallel with the belief function (evidential) frame- 
work as well [22]—[29]. Most of the aforementioned research 
works dealt only with acoustic sensors only (i.e SONAR). 
Recently, DSmT has also been applied for the perception of 
the environment with acoustic sensors as reported in [30]-[32], 
[34]-[36]. 

Our main contribution is to propose a new method to make 
the perception with non acoustic sensors, and to compare the 
performances of the Proportional Conflict Redistribution rule 
no. 6 (PCR6) of DSmT with respect to Dempster-Shafer’s 
(DS) rule of combination in terms of accuracy of grid map 
estimation. In our application, we work with a LIDAR sensor 
on board on a robot moving in dynamic environment. 

This paper is organized as follows. After a short presentation 
of the basics of belief functions and rules of their combination 
based on DST and DSmT in the next section, we will present 
the inverse sensor models in section III with the construction of 
the basic belief assignments (BBA). In section IV, we present 
an illustrating scenario for environment perception including 
a mobile object with a platform equipped with a LIDAR, and 
we compare our new realistic simulation results with those 
obtained by the probabilistic and the classical belief-based 
approaches. We will show how static and mobile objects are 
extracted from the occupancy grid map using digital image 
processing. Finally, conclusion and outline perspectives are 
given in section V. 
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II. EVIDENTIAL FRAMEWORK 


Dempster-Shafer’s theory (DST) of evidence has been de- 
veloped by Shafer in 1976 from Dempster’s works [6] . DST is 
known also as the theory of belief functions and it is mainly 
characterized by a frame of discernment (FoD), sources of 
evidence represented by basic belief assignment (BBA), belief 
(Bel) and plausibility (Pl) functions, and the Dempster’s rule, 
denoted as DS rule of combination in the sequel!. DST has 
been modified and extended into Dezert-Smarandache theory 
[8] (DSmT) to work with quantitative or qualitative BBA and 
to combine the sources of evidence in a more efficient way 
thanks to new proportional conflict redistribution (PCR) fusion 
tules — see [37]-[40] for discussion and examples. We briefly 
recall in the next subsections the basics of the theory of belief 
functions. 


A. Belief functions 


Let’s consider a finite discrete FoD Q = {w,we,..., Wn}, 
with n > 1, of the fusion problem under consideration and 
its fusion space G® which can be chosen either as the power- 
set 2°, the hyper-power set” D®, or the super-power set S® 
depending on the model that fits with the problem [8]. A 
BBA associated with a given source of evidence is defined 
as the mapping m(.) : G? -—> [0,1] satisfying m(0) = 0 
and }) yege m(A) = 1. The quantity m(A) is called mass of 
belief of A committed by the source of evidence. Belief and 
plausibility functions are defined by 


Bel(A)= S~ m(B) and PI(A)= S> m(B) (1) 
BCA BnA#o 
BeG® BeG® 
The degree of belief Bel(A) given to a subset A quantifies 
the amount of justified specific support to be given to A, 
and the degree of plausibility Pl(A) quantifies the maximum 
amount of potential specific support that could be given 
to A. If for some A€ G®, m(A) > 0 then A is called a 
focal element of the BBA m/(.). When all focal elements 
are singletons and G® = 2° then the BBA m(.) is called 
a Bayesian BBA [6] and its corresponding belief function 
Bel(.) is homogeneous to a (possibly subjective) probability 
measure, and one has Bel(A) = P(A) = PI(A), otherwise 
in general one has Bel(A) < P(A) < Pl(A), VA € G®. The 
vacuous BBA representing a totally ignorant source is defined 
as m,(Q) = 1. 


B. Fusion rules 


Many mathematical rules have been proposed in the liter- 
ature over the decades (see [8], Vol. 2 for a detailed list of 
fusion rules) to combine efficiently several distinct sources of 
evidence represented by the BBA’s ™j(.), ™mo(.), ..., ms(.) 
(s > 2) defined on same fusion space G®. In this paper, we 
focus only on DS rule because it has been historically proposed 


'DS acronym standing for Dempster-Shafer since Dempster’s rule has been 
widely promoted by Shafer in the development of his mathematical theory of 
evidence [6]. 

2which corresponds to a Dedekind’s lattice, see [8] Vol. 1. 


in DST and it is still widely used in applications, and on the 
PCR rule no. 6 (i.e. PCR6) proposed in DSmT because it 
provides a very interesting alternative of DS rule, even if PCR6 
is more complex to implement in general than DS rule. 

In DST framework, the fusion space G'* equals the power- 
set 2° because Shafer’s model of the frame © is assumed, 
which means that all elements of the FoD are exhaustive and 
exclusive. The combination of the BBA’s m,(.) and ma(.), is 
done by : m3} (0) = 0 and for all X 40 in 2° 


2 

[ux @ 
X1,X2€22 1=1 
X1NX2=X 
where the numerator of (2) is the mass of belief on the 
conjunctive consensus on X. The denominator 1 — m  9(0) 
is a normalization constant, where the total degree of conflict 
denoted ™,2(0) between the two sources of evidences is 
defined by 


my,2(0) = S- [[ ms) 


X1,X2€29 i=1 
X1NX2e=0 


(3) 


According to Shafer [6], the two sources are said in total 
conflict if m,2(0) = 1. In this case the combination of the 
sources by DS rule cannot be done because of the math- 
ematical 0/0 indeterminacy. The vacuous BBA m,(Q) = 1 
is a neutral element for DS rule. This rule is commutative 
and associative, and the formula (2) can be easily generalized 
for the combination of s > 2 sources of evidences. DS rule 
remains the milestone fusion rule of DST. 

The doubts of the validity of DS rule has been discussed 
by Zadeh in 1979 [47]-[49] based on a very simple example 
with two highly conflicting sources of evidences. Since 1980’s, 
many criticisms have been done about the behavior and the 
justification of such DS rule. More recently, Dezert et al. in 
[37], [38] have put in light other counter-intuitive behaviors 
of DS rule even in low conflicting cases and showed serious 
flaws in logical foundations of DST [39]. To overcome the 
limitations and problems of DS rule of combination, a new 
family of PCR rules have been developed in DSmT framework. 
We present the most elaborate one, i.e. the PCR6 fusion rule, 
which has been used in our perception application for grid 
occupancy estimation. 

In PCR rules, instead of following the DS normalization 
(the division by 1 — m1,2(0)), we transfer the conflicting mass 
only to the elements involved in the conflict and proportionally 
to their individual masses, so that the specificity of the 
information is entirely preserved. The general principle of PCR 
consists: 1) to apply the conjunctive rule, 2) to calculate the 
total or partial conflicting masses; 3) then redistribute the (total 
or partial) conflicting mass proportionally on non-empty sets 
according to the integrity constraints one has for the frame 
Q. Because the proportional transfer can be done in different 
ways, there exist several versions of PCR rules of combination. 
PCR6 fusion rule has been proposed by Martin and Osswald in 
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[8] Vol. 2, Chap. 2, as a serious alternative to PCRS fusion rule 
proposed originally by Smarandache and Dezert in [8] Vol. 2, 
Chap. 1. Martin and Osswald had proposed PCR6 based on 
intuitive considerations and they had shown through different 
simulations that PCR6 was more stable than PCRS in term 
of decision for combining s > 2 sources of evidence. When 
only two sources are combined, PCR6 and PCRS fusion rules 
coincide, but they differ as soon as more than two sources 
have to be combined altogether. Recently, it has been proved 
in [40] that only PCR6 rule is consistent with the averaging 
fusion rule which allows to estimate the empirical (frequentist) 
probabilities involved in a discrete random experiment. 

For Shafer’s model of FoD?, the PCR6 combination of two 
BBA’s m4 (.) and mm2(.) is defined by mf'$"°(0) = 0 and for 
all X #0 in 2° 

mp9 *°(X) = o> my (X1)m2(X2) 


X1,X2€22 
X{NXg=X 


m1(X)?m2(Y) 
+ Ds. a Oeeea) 


mo(X)?m1(Y) 


Ye2?\ {x} 
XNY=6 


where all denominators in (4) are different from zero. If a 
denominator is zero, that fraction is discarded. All proposi- 
tions/sets are in a canonical form [8]. Very basic Matlab codes 
of PCR rules can be found in [8], [41] and from the toolboxes 
repository on the web [46]. Like the averaging fusion rule, 
the PCR6 fusion rule is commutative but not associative. The 
vacuous belief assignment is a neutral element for this rule. 


C. Discounting 


A discounting effect can be applied on a mass function m(.) 
if a piece of information has its reliability lowered. In this case, 
a new mass function m,(.), (with a € [0,1]) is computed 
from m/(.) and a part of the mass of each element of the FoD 
is transferred to the whole FoD 2 which represents the total 
ignorance. 


l-—a)-m(A ifA AQ 
Me (A) = me es as (5) 
(l—a):-m(A)+a ifA=Q 
D. Pignistic transformation 


Finally, the pignistic transformation BetP [45] allows to 
compute a probability measure from a mass function by 
distributing proportionally the mass of the subsets on their 
focal elements: 


VA €Q, BetP(A) + 


m (B) (6) 


where |A NM B| is the cardinal of the subset AM B, and |B] is 
the cardinal of subset the B. 

However, this transformation is not bijective (a part of the 
information is lost). So, one can find an infinity of mass 
functions with the same pignistic probability. This issue is 
inherent in the nature of probabilities which are not able to 
distinguish randomness from (epistemic) uncertainty. 


3i.e. when G2 = 22, and assuming all elements exhaustive and exclusive. 


III. EVIDENTIAL OCCUPANCY GRID 


The basic idea of an Occupancy Grid (OG) is to divide the 
surrounding environment (the ground plane of 2D world) into 
a set a cells (denoted C’, i € [0, n]) in order to estimate their 
occupancy state. In a probabilistic framework, the aim is to 
estimate the probabilities P (O’|z1..) and P (F"|z1.4) given a 
set of measures z1.; from the beginning up to the current time 
t. O* (resp. F”) denotes the occupied (resp. free) state of the 
cell C*. Finally, a decision rule is applied in order to select 
the most likely state for each cell. 

For Evidential approach, occupancy grid represents the in- 
formation using a mass function over the frame of discernment 
(FoD) 2 = {F, O}. So the mass functions used in grid have 
the structure 


me =[ m(0) me(F) me (O) m:(Q) | (7) 


The occupancy mass function can be used during the fusion 
process, then the decision can be taken using pignistic trans- 
form to get a probability measure and use the same decision 
rule. An interesting part of evidential occupancy grid is that 
the FoD can be more complex, and as the fusion is done cell 
by cell the fusion scheme will be still valid . 

Occupancy grids can be classified into two categories de- 
pending on the use of a forward, or inverse, sensor model. The 
forward model relies on Bayes inference. Since this approach 
takes into account the conditional dependency of the cells of 
the map, it is well adapted to a sensor that observes a large 
domain of cells with only one reading measurement (e.g. a 
ultrasonic SONAR). However, it requires heavy processing 
that can be handled by optimized approximation [42] or GPU 
computing [43]. 

The inverse model approach is well adapted to narrow fields 
of measures sensors (e.g. LIDAR). It is composed of two 
separate steps. First, a snapshot map of the sensor reading 
is built using an inverse sensor model P (O*|z). This model 
can take into account the conditional dependency between the 
sensor reading and the occupancy of the seen cells. Then, a 
fusion process (denoted ©) is done with the previous map 
P (O*|z1:e-1) as an independent opinion poll fusion: 


P (Ola) = P (O*|zt) © P (O*|z1.4-1) (8) 


In the probabilistic framework, the usual fusion operation 
between states A and B coming from independent measure- 
ment, use independent opinion poll [52] : 


P(A): P(B) 

P(A): P(B)+(1— P(A))-(1— P(B)) 

Inverse approaches have very efficient implementations (e.g. 
log-odd) that make them popular in mobile robotics [10], [14], 
[44]. Maps built using inverse models are usually less accurate, 
since they just take into account the dependency of the cells 
observed in one reading, but it is a good approximation 
with accurate and high resolution sensors observing a limited 
number of cells at a time. Moreover, when the sensor is multi- 
echo and multi-layer, the conditional dependency of the seen 
cells can be modeled in an efficient way. 


P(A)©P(B)= (9) 
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A. Fusion strategy with the inverse model 


When dealing with the inverse model approach, an estimate 
of the pose of the robot has to be available and map grid 
G™ has to be handled. This grid is defined in a world- 
referenced frame (so it does not move with the robot) and 
is updated when a new sensor reading is available. Because of 
the likely evolution of the world in a dynamic environment, 
the OG update has to be completed by a remanence strategy. 
The fusion architecture follows then a prediction-correction 
paradigm and can be used to fuse one or several sensors 
observations. 

a) Prediction step: The prediction step computes the 
predicted map grid at time ¢ from the map grid estimated at 
time t — 1. Depending on the available information, this step 
can be very refined like done in [43]. As we consider here 
that no specific information on the velocity of the objects (or 
cells) is available, the prediction is done by discounting. The 
confidence in aged data is controlled by a remanence factor 
a € [0;1]. The prediction stage is therefore governed by 

GM = discount (GH, a) (10) 

b) Correction step: The correction step consists in the 
combination of the previously estimated map grid with the 
grid built from the current measures thanks to the inverse 
model sensor (see more details in [4], [5]). This one is called 
ScanGrid G?. As this information is referenced in the sensor 
frame, a 2D warping is applied to reshape this grid into the 
fusion frame. To perform this operation, the current pose q 
is estimated using a GPS sensor and the rigid homogeneous 
transformation matrix H; is computed. When GPS becomes 
unavailable, the CAN (Controller Area Network) bus is used 
to get the robot odometric data. The motion matrix H; and the 
extrinsic calibration matrix C’ are used to compute a remapping 
function f(a.y) according to Eq.(11) below 


x 


f(t,y)=C-H- | y 
1 


(11) 


Finally, the ScanGrid is remapped with f and fused with the 
previous map grid. 

The grid G? represents the BBA produced by the sen- 
sor model. This BBA is created in respect to sensor 
data (e.g. LIDAR point here) and a sensor model to in- 
fer an instant occupancy grid. For probabilistic approach, 
it refers to the occupancy probability P;>(O), for evi- 
dential approach it refers for a occupancy mass function 
mp = | mp (0) mp (F) mp (O) m7? (Q) |. The grid 
G™ refers to the previous MapGrid G , predicted at current 
time using Eq.(10). In the following parts, for each approach 
considered, the rule © used in Eq.(12) is different. Bayesian 
approach uses Eq.(9), DS approach uses Eq.(2) and PCR6 
approach uses Eq.(4). 


B. Discounting in Occupancy Grids 


The main advantage of using discounting is to provide a 
simple way to model the presence of dynamic object in the 
scene. This model allows to make a prediction without infor- 
mation on the dynamic at the cell level (or at the object level) 
which is generally not directly available from sensors and 
merely difficult to estimate without greedy time-computing 
algorithms [43] (especially when the evidential framework is 
adopted). The main issue with the discounting effect is that 
it makes impossible to build persistent static map. Indeed, 
cells not viewed by the sensor will quickly converge to the 
ignorance state. Therefore, this strategy cannot be used to 
build the map of a building for instance. If we are interested to 
build static map in presence of moving objects, the discounting 
function is then not recommended. We will see why in the next 
part of the paper where in this case Bayesian and DS fusion 
rules will not be very efficient. To handle this case, we will 
show why it is recommended to use the PCR6 rule. 


IV. RESULTS 


In this section, we present simulation results of grid oc- 
cupancy estimation in a realistic scenario based on different 
rules of combination (Bayesian fusion, Dempster-Shafer rule, 
and PCR6 fusion rule). 


A. Basic simulation 


a) Setup: In order to present the basic behavior of the 
different combination rules studied, we have realized at first 
some simple 1D-simulations, where we consider a grid cell 
crossed by a moving object. In this case, the state of the 
cell changes from free-state to occupied-state at time t; and 
from occupied-state to free-state at time tz. Figure 1 shows the 
results of these simulations under different conditions. On each 
subfigure, we show on the top plot the real state of the cell (i.e. 
the groundtruth). The second raw of each subplot shows the 
sensor data simulated that corresponds to the BBA of the state 
of the cell. This mass function is built according to the state 
of the cell, the level of confidence of the sensor and can be 
eventually perturbed with additional noises. FA indicates the 
rate of False Alarms, and ND the rate of Non Detections. We 
will consider different levels of confidence for msqg(O) when 
the cell is occupied, and mgc(F’) when the cell if free. The 
bottom plot of each subplot represents the level of belief of 
the cell state obtained with Bayesian fusion, Dempster-Shafer 
(DS) fusion and the PCR6 fusion rules respectively. 

Effect of discounting: Figure la presents the results of 
the classical chain using a discounting factor a = 0.05 while 
figure 1b presents the same case without discounting (a = 
0). If the discounting is applied, all the fusion rules behave 
similarly, but if the discounting is not used, a lag appears 
with Bayesian and DS fusion rules. The lag effect is seriously 
reduced with PCR6 rule. 

Performances analyses: The performance of our method 
is summarized in Table I. For each simulation, 10000 Monte 
Carlo runs have been performed, in order to estimate the 
false alarm and non detection rates. In order to make the 
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(a) Case with discounting (a = 0.05). 
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(c) Case with noise (FA=10%, ND=10%) and no discounting. 
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(b) Case without discounting (@ = 0). 
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(d) Case with noise (FA=30%, ND=15%) and no discounting. 


Figure 1: Evolution of the belief in a cell crossed by an obstacle observed by a sensor. 


decision, the pignistic probability has been computed and a 
MAP estimator has been used. For each simulation presented 
in the Table I, we also mention the discounting level, the rate 
of noise impacting sensor observations. 


Simulations 0 and 1, illustrated by the figure la and the 
figure 1b, correspond to the noise-free situation. By removing 
the discounting operator, Bayesian and DS approach have a lag 
in the detection of the change of state that impacts clearly their 
performances. The PCR6 approach is much less concerned by 
this effect because of the proportional conflict redistribution 
process. Simulations 2 and 3 (see figure 1c) include 10% of 
wrong measurement caused by noises. The fusion rules behave 
similarly as for simulation 0 and 1, but the performances are 


a bit lower which reflects the effect of noisy measurements in 
the grid estimation process. For simulations 4, 5 and 6, the 
noise reaches 15% for ND and 30% for FA which is quite 
strong. As we see in figure Id, the Bayesian and DS fusion 
rules are not able to detect the second state change, during the 
simulation time. This induces bad False alarm rates. In the last 
simulations 7 and 8, the noise is about 25% of ND and 50% of 
FA. In these conditions, all the methods have poor false alarm 
rates but the PCR6 keeps good (low) non detection rates. 


B. LIDAR simulation 


In this simulation, the DS and PCR6 fusion rules are 
compared on a 2D occupancy grid problem close to real 


161 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


N° | discounting | key time | sensor noise sensor belief Bayesian DS PCR6 
a ti/te ND/FA msc (O) /msc (F) ND FA ND FA ND FA 
0 0.05 20/40 0 0.8/0.6 10.0 6.0 10.0 6.0 10.0 6.0 
I 0 20/40 0 0.8/0.6 65.0 | 24.0 | 60.0 | 32.0 | 10.0 0.6 
2 0.05 20/40 T0/10 0.8/0.6 11.2 9.2 10.5 10.0 | 10.0 9.6 
3 0 20/40 T0/10 0.8/0.6 TTT 15.2 | 73.5 18.9 | 115 6.7 
4 0.05 20/40 15730 0.8/0.6 92 28.0 8.2 31.5 8.4 28.9 
5 0 20/40 15730 0.8/0.6 33.0 | 62.7 | 26.9 | 65.8 8.4 28.8 
6 0 20/40 15730 0.6/0.4 313 | 63.9 | 26.0 | 673 93 38.7 
7 0 20/40 25/50 0.6/0.4 I5.0 | 76.9 | 11.5 | 79.4 5.7 64.0 
8 0 20/40 25/50 0.4/0.2 Tl 83.9 3.1 85.1 1.9 87.3 


Table I: Comparison of false alarm and non detection rates (%). 


(a) View of the Gazebo simulation: the box 
turns around the LIDAR sensor. 


(b) Bird view of one LIDAR scan. 


Figure 2: Simulation setup. 


application for robot perception. The simulation was realized 
using the Robot Operating System (ROS) [50] environment 
and the Gazebo [51] simulator is used here to simulate a 
Hokuyo LIDAR and a moving object as shown on Figure 2a. 
The simulated sensor has a FoV (Field of View) about 270° 
and a max range about 10m. The rate of the scan is 20Hz and 
the ranges of the LIDAR point are corrupted with a Gaussian 
noise NV (0, 0.1). 

Figure 2b shows a simulated LIDAR scan. The beams 
that do not hit obstacle within the range are considered as 
max range (as done in the real Hokuyo sensor). The moving 
object is a box which has a circular trajectory and moves at 
6 rpm around the LIDAR. A ground true grid is computed 
according the real position of the box and its geometry at 
each scan time. The grid used is a square of 10 m by 10 m 
with a resolution of 0.1 m, and the ScanGrid BBA are set 
to msg (O) = 0.8, msg (Q) = 0.2 for occupied cells and 
mga (F) = 0.6, mgg (Q) = 0.4 for free cells. 

In order to quantify the results, we compute some metrics. 
However, because of occlusion, only the cells located on the 
edges of the box can be considered, that is why we don’t 
consider global metrics. We consider here the two following 
metrics: 1) the number of correct occupied cell (proportional 
to recall in our case), and 2) the number of conflicting cells 


close to the box. The first describes the ability of the method to 
add objects into the map and also by analogy to remove object 
from the map. The second describes the ability of the method 
to detect moving objects by generating conflict. This ability 
is important and is one of the improvement of evidential grid 
against classical Bayesian grid estimation. 


Figures 3 & 4 show the result over one turn. The number of 
cells detected for both metrics depends a lot on the position 
around the sensor. This can be explained because, in some 
place, the LIDAR sensor is able to see two edges of the box. 
In other situations, the LIDAR sensor detects just one edge, 
and when the box is behind (on the back of) the sensor it is 
out of the field of view of the LIDAR. On Fig. 3, we can 
see that the number of occupied cells with the PCR6 fusion is 
greater than with the DS fusion. Contrarily to the PCR6, the 
DS fusion without discounting cannot estimate well the quick 
changes of states in the map. From the motion standpoint, 
Fig. 4 shows that the PCR6 approach keeps the same level 
to generate conflict in presence of moving object (similar to 
DS fusion). The x-axis of Figures 3 & 4 is the time stamp 
of the LIDAR scans, and the y-axis is the number of cells in 
different states (occupied for Fig. 3, or with conflict for Fig. 
4). 


162 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Figure 3: 2D LIDAR simulation: Number of correct occupied 
cells (green=PCR6,blue=DS). 
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Figure 4: 2D LIDAR simulation: Number of conflicted cells 
into the box shape (green=PCR6,blue=DS). 


C. Real data processing 


A real experimentation was realized using an Hokuyo 
UTM-30LX sensor. This experimentation takes place in an 
office in which a person was walking into. The evidential 
occupancy grid fusion node was implemented within the ROS 
environment. The grid has the same size and resolution as in 
the previous example. The BBA used in the sensor model has 
been set to msg (O) = 0.8, msg (Q) = 0.2 for occupied 
cells and mgg (F’) = 0.86, msg (Q) = 0.2 for free cells. No 
discounting was applied. 

Figure 5 presents the occupancy grid estimation using DS 
and PCR6 rules of combination and for two typical snapshots 
of the sequence. The color of cells denotes the state having 
the highest mass value: green for F' (free state), red for 
O (occupied state), and black for 2 (full uncertainty). For 
convenience, we have also displayed in blue all the cells 
that carry a conflicting mass m(0) > 0.1 before applying 
the normalization step of DS rule, or before applying the 
proportional conflict redistribution with PCR6. Figure 5a and 
5c show the result using DS rule. The room scanned by the 
sensor is correctly mapped and its bounds (mainly walls and 
doors) are clearly identified by the red pixels. The free space 
(green pixels) is correctly detected in the room except near the 
people that is labeled as free (with conflicting cell shown in 
blue for convenience). The people moving around the desk in 
the office room is only detected from conflicting cells when 


he stops to walk several times. Figure 5b and 5d show the 
PCR6 result at the same time stamps. In this case, the people 
is correctly detected as shown by the red pixels (occupied 
cells) inside the green area (the office room). A conflict cell 
is created when he starts walking in the room. The static part 
of the room is also detected (as with DS fusion rule). 


(b) Snapshot 1 - PCR6 fu- 


sion. 


(a) Snapshot 1 - DS fusion. 


(c) Snapshot 2 - DS fusion. (d) Snapshot 2 - PCR6 fu- 


sion. 


Figure 5: Result of evidential occupancy grid in real experi- 
mentation. 


V. CONCLUSIONS AND PERSPECTIVES 


In this work we have presented a novel application of the 
belief functions which significantly improves the map build- 
ing process for robots environment perception and grid map 
estimation. This work shows the importance of defining an 
accurate sensor model. We have considered the uncertainties 
of the LIDAR measurements and used the PCR6 rule of 
DSmT to model and combine sensor information. Our new 
method differs of Bayesian approach by allowing support for 
more than one proposition at a time, rather than a single 
hypothesis. It is a interval-based approach, as defined by the 
lower and upper probability bounds [Bel, Pl] allowing the lack 
of measurement to be modeled adequately. This new method 
differs also from the classical evidential approach because 
the PCR6 rule is used instead of DS rule. Experimental 
results with the LIDAR confirm the improvements of the 
accuracy of this new grid estimation method with respect to 
previous methods. As perspectives, we will try to implement 
this fusion rule in 3D occupancy grid (Octomap based) and 
use a stereo camera with dense disparity map computation 
as sensor source. In future works, we will consider in this 
perception context more classes into the frame of discernment 
and we will also test the improved PCR6 rule of combination 
including Zhang’s degree of intersection. 
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Abstract—Civil engineering protection works mitigate natural 
risks in mountains, such as torrents. Analysing their effectiveness 
at several scales is an essential issue in the risk management. 
Based on expert knowledge, used methods have been developed 
under risky environment. However, decision is made under 
uncertainty because of 1) the lack of information and knowledge 
on natural phenomena and 2) the heterogeneity of available 
information and 3) the reliability of sources. In this paper, we 
propose to help decision-makers with advanced multicriteria 
decision making methods (MCDMs). Combining classical MCDM 
approaches, belief function, fuzzy sets and possibility theories, 
they make it possible decisions based on heterogeneous, imprecise 
and uncertain evaluation of criteria provided by more or less 
reliable sources in an uncertain context. COWA-ER (Cautious 
Ordered Weighted Averaging with Evidential Reasoning), Fuzzy- 
Cautious OWA or ER-MCDA (Evidential Reasoning for Multi 
Criteria Decision Analysis) are thus applied to several scales of 
effectiveness assessment. 


Keywords: Torrent protection, belief functions, MCDM. 


I. INTRODUCTION 


Mountain natural phenomena such as torrential floods put 
people and buildings at risk. Protection works influence both 
causes and effects of phenomena to limit induced risks. For 
instance, check-dams control material volume and flow of 
torrential floods. Their design allow them to reduce sediment 
production (Figure 1). Defining the strategy for investment 
and maintenance is an essential issue in the risk management 
process. It is based on their effectiveness assessment. Deci- 
sion support tools help assessing their economic efficiency 
depending on their structural state and functional effects on 
phenomena (stopping, braking, guiding, etc.) [1]. 

Cost Benefit Analysis (CBA) is the most used decision- 
aid method in the natural hazard context. It helps assessing 
efficiency of potential actions comparing, for several scenarii, 
investment and maintenance costs with direct and indirect 
losses [2]. Actually, natural risk analysis is limited to a set 
of scenarii which can be discussed [3]. However, probability 
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knowledge (distribution or scenarii) is affected by the lack 
of information on phenomena, but also by heterogeneity and 
reliability of available sources (Tacnet 2009). [4]. 

Concepts of failure mode and effects analysis (FMEA), 
already used for hydraulic dams [5], are extended to assess the 
effectiveness of check-dams [6]. Those methodologies elicit 
the expert reasoning process and consider structural, functional 
and economic features [1]: indicators formalise information 
processing to make it repeatable and reproducible [7]. Never- 
theless, assessment is based on heterogeneous and imprecise 
information provided by more or less reliable sources [4]. 

Methods to represent information imperfection are needed 
to aid decisions including check-dam effectiveness assessment. 
Advanced MCDMs combining classical MCDM approaches 
[8], [9], belief function [10], [11], fuzzy sets [12] and possibil- 
ity theories [13] have been developed to help decisions under 
risk or uncertainty such as COWA [14], Fuzzy-Cautious OWA 
[15] and ER-MCDA [16]. 

This paper first recalls the context of information imper- 
fection related to check-dams. We secondly introduce the 
principles of new belief function theory based evolutions of 
MCDMs. We then apply them to cases related to effectiveness 
of check-dams. We finally discuss remaining issues for new 
decision-making methods in risky and uncertain contexts. 


II. EFFECTIVENESS OF PROTECTION WORKS IN AN 
UNCERTAIN ENVIRONMEN 


Assessing effectiveness of existing check-dams is based 
on their structural state, functional capacity and relative risk 
reduction. We describe below the decision context and infor- 
mation imperfection all over the decision process. 


A. Formalization of decision context 


1) Several system scales as alternatives: Protecting ex- 
posed elements with check-dams is based on interdependent 
systems. A check-dam E! belongs to a device D°. Several 
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Figure 1. Multi-system formalization for check-dams in a torrential watershed. 


devices protect exposed elements at the watershed scale (F’). 
Each BP! is considered as an alternative belonging to a set of 
m check dams D° = {E!,..., E',..., E™}. D® is a device 
alternative in the set F = {D!,...,D°..., D'}. F represents 
all t devices which protect exposed elements in the watershed 
(Figure 1). 

2) Possible actions on systems as alternatives: For each 
system scale EZ’, D° and F, several actions a; can be proposed: 
for example, no action (a1), maintenance of check-dams (a2) 
or building new works (a3). El, D° and F; represent all 
possible actions on each system scale (resp. a single check- 
dam, a set of check-dams, all sets in the watershed). 


3) Decision objects and linked problems: A_ decision- 
making problem consists in choosing, ranking or sorting 
alternatives on the basis of quantitative or qualitative criteria 
g; (9]. Effectiveness is the level of objective achievement [17]. 
Sorting alternatives E’, D° and F in effectiveness classes 
(e.g., optimal, correct, partial, deficient) is a recurrent issue. 
Choosing between several alternatives a;, or ranking them, are 
other practical issues. 


B. Various information is needed but is imperfect 


1) The states of the nature S or S°: Debris flows and 
torrential floods with bed-load transport are the two main 
torrential processes [18]. Choosing a specific criterion of 
interest for each process is needed (e.g., flow volume or 
deposit depth). 

The states of the nature analysis depends on its loca- 
tion in the watershed. They can be represented by a fi- 
nite or a continuous set according to available information. 
For torrential floods, field experts define a finite set S = 
{S1,...,Sx,...,Sn} for F scale and another set S° = 
{S?,...,52,..., 92} for D° and E! scales [1]. 


2) The decision-maker (DM) preferences on g;: Assessing 
each alternative in a MCDM context requires three elements 
from the DM about gj: 1)the list of g;, 2) weights w;: 
preferences between gj, 3) gj; assessment scale: preferences 
between alternative evaluations through a total or a partial pre- 
order [9], [19]. 


3) Decision-making and imperfect information: To com- 
pare several alternatives, decision support tools are based on 
several g; evaluations of their consequences (payoffs/gains) 
under S (or S°). For example, each F; is evaluated given the 
knowledge on S' and the payoff matrix defined by C = [Cix] 
where i=1,...,q and k=1,...,n (Eq. (1)). The decision 
problem consists in choosing the alternative Fj» € Fi; which 
maximizes the payoff to the DM. We assume that Cj, assess- 
ment can be based on several gj. 


Sy ass De «sa. Sy 
Fy Cy Cir Cin 
FF. | Cia Cik Cy | =C, (1) 
F, \Cq on Con 


Whatever the decision context, all decisions relate to im- 
perfection of used information to assess S, Ci, and g; [4]: 
inconsistency (conflict between sources); imprecision (e.g., 
interval of numerical values); incompleteness (lack of in- 
formation while data exist); aleatory uncertainty (aleatory 
events); epistemic uncertainty (lack of knowledge). 

Depending on his knowledge about S, the DM is face 
on different decision-making problems [14]: under certainty 
(only one S; is known); under risk (the true S is unknown 
but one knows all the probabilities p, = P(S,)); under 
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ignorance (one assumes no knowledge about the true state 
but that it belongs to S); under uncertainty (the knowledge 
on S is characterized by a belief structure). 


III. NEW BELIEF FUNCTION THEORY BASED EVOLUTION 
OF MCDMs 


Comparing alternatives requires assessment of 1) DM pref- 
erences on w; and g;, 2) information imperfection to evaluate 
S and g;, 3) MCDM choice to aggregate several g; to define 
C. In this part, we introduce the principles of methods based 
on new MCDMs evolutions based on belief function theory. 


A. Basics of belief functions 


Shafer [10] originally proposed the basics of belief func- 
tions. One starts with a finite set © (called the frame of 
discernment of the decision problem). Each element of O is a 
potential answer of the decision problem and they are assumed 
exhaustive and exclusive. The powerset of © denoted 2° is 
the set of all subsets of O, empty set included. A body of 
evidence is a source of information that will help the DM to 
identify the best element of O. The interest of belief functions 
is their ability to model epistemic uncertainties. Each body of 
evidence is characterized by basic belief assignment (bba), or 
a mass of belief, which is a mapping m/(.) : 2° — [0,1] 
that satisfies m(@) = 0, and for all A #4 @ € 2° the 
condition }> 4-6 m(A) = 1. The Belief function Bel(.) and 
the plausibility function PI(.) are defined from m(.) by : 


Bel(A)= 5° m(B), (2) 
BCA|Be2° 
P(AY=  S) mB). (3) 


BNA#O| BE2° 


Bel(A) and the plausibility function PI(A) are often inter- 
preted as lower and upper bounds of the unknown probability 
of A. The vaccuous bba defined as m,(Q) = 1 models the full 
ignorant source of evidence. Shafer [10] proposed Dempster’s 
rule to combine distinct sources of evidence which has been 
subject to strong debates in fusion community starting from 
Zadeh’s first criticism in 1979. Since the 90’s many alterna- 
tives have been proposed to combine more or less efficiently 
belief functions, as well as an extension of belief function 
in the framework of Dezert-Smarandache Theory (DSmT) as 
shown and discussed in [11]. 

According to the DM attitude, credibilities, plausibili- 
ties, Smets’ Pignistic probability BetP [20] or Dezert- 
Smarandache probability DSmP.=o [11] (Vol. 3) can be 
computed to compare alternatives. 


B. ER-MCDA 


Tacnet [4] proposed the ER-MCDA methodology. Its orig- 
inality consists in the association of different theories. It 
dissociates imperfect evaluations from their combination in 
the fusion process considering both evaluation imperfection 
and heterogeneity, reliability of sources. It uses developments 
for MCDM based on the combination of Analytic Hierarchic 
Process (AHP) approach developed by Saaty [8] and DSmT 


[11]. AHP allows to build bbas from DM preferences on 
solutions which are established with respect to several gj. 
DSmT allows to aggregate efficiently the (possibly highly 
conflicting) bbas based on each criterion. DSmT-AHP method 
also allows to take into account the different importances of 
g; and/or of the different members of the DM group. 
ER-MCDA exploits the following general principles into 
independent steps: 
e The AHP methodology helps to analyze the decision 
problem through a hierarchical structure and to define the 
evaluation classes for decision through a common frame of 
discernment 0. 
e The imprecise evaluation and mapping of gj: qualitative 
or quantitative criteria are evaluated through possibility dis- 
tributions representing both imprecision and uncertainty [13]. 
Possibility distribution can be derived into bbas [21]. We use 
a mapping process that projects the bbas expressed on fuzzy 
sets expressed on O [12]. 
e The fusion of mapped evaluations and g;: a first fusion 
process is done for all evaluations of the different sources for 
a same g;. Bbas can be discounted according to the reliability 
level of each source. We finally get bbas for each g; whose 
weights w; have been defined according to the classical AHP 
method. Those w; are derived into importance discounting 
factors. Bbas corresponding to each g; are then fused a second 
time to get the final result which is called a decision profile. 
This profile shows not only the decision to take but provides 
also an evaluation of the distribution of knowledge on the 
other levels and uncertainty. It is possible to check if all 
sources agree about the decision and also to have an idea about 
the uncertainty of their evaluation. The quality of information 
leading to decision is linked to the decision itself. The results 
can be bbas or belief, plausibility values that correspond to 
pessimistic or optimistic choice of a decision level. With ER- 
MCDA, one uses PCR6 (Proportional Conflict Redistribution 
Rule no 6) developed in DSmT [11] (Vol. 3) to palliate 
disadvantages of the classical Dempster fusion rule discussed 
in [22]. The importance of criteria is a different concept than 
the classical reliability concept developed and used in the 
belief theory context. In order to make a difference between 
importance of criteria, uncertainty related to the evaluations of 
criteria and reliability of the different sources, specific methods 
such as DSmT-AHP [23], [24] have extended Saaty’s AHP 
method. 


C. COWA-ER and Fuzzy Cautious OWA 


Tacnet and Dezert [14] proposed the COWA-ER method 
for decision-making under uncertainty taking into account 
imperfect evaluations and unknown beliefs about groups of 
the possible states of the world. COWA-ER mixes cautiously 
the principle of Ordered Weighted Averaging (OWA) approach 
[25] with the fusion of belief functions proposed in DSmT 
[11]. Fuzzy Cautious OWA [15] is an improvement of COWA- 
ER using fuzzy sets. 


167 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


1) The OWA approach: To recall it, we take into account 
the decision-making problems introduced in II-B3 and Eq. 
(1). 

1 - under certainty: 
i* = arg max;{Cjx}. 

2 - under risk: as for the CBA (cf Sec. I), for each F;, we 
compute expected payoff E[C;] = >>), pe-Cir, then we choose 
Fj with i* £ arg max;{ E[C;]}. 

3 - under ignorance: Yager [25] uses the OWA operator as a 
weighted average of ordered values of a variable. For each F; 
and a given criterion of interest g;, one chooses a weighting 
vector W; = [wi1, wi2,... Win] and computes its OWA value 
V; & OWA(Ci1, Cia, seey Cin) = ar Wik * bik where bik 1S 
the Ath largest element in the collection of payoffs Ci, 
Ciz,..., Cin. Then one chooses F;+ with i* = arg max;{V;}. 
W, depends on the decision attitude of the DM (pessimistic, 
optimistic, normative/neutral, etc.). 


one chooses F;« with 


4 - under uncertainty: one assumes that a priori knowledge 
on the frame S is given by a bba m(.) : 25 — (0, 1]. This case 
includes all previous cases depending on the choice of m/(.). 
Yager’s OWA under uncertainty is based on the derivation of a 
generalized expected value C; of payoff for each F; as follows: 


Ci = S5 m(X)Vit, (4) 
l=1 


where r is the number of focal elements of the belief structure 
(5, m(.)). m(X7) is the mass of belief of X; € 25, and Vj 
is the payoff we get when we select F; and the state of the 
nature lies in X;. 

For Ff; and a focal element XX), instead of using 
all payoffs Ci, we consider only the payoffs in the 
set Mi = {Cix|S_ € Xi} and V; = OWA(M;,) for some 
decision-making attitude chosen a priori. Once generalized 
expected values Ci, i = 1,2,...,q are computed, we compare 
alternatives through these results. 

The principle of this method is simple, but its implementa- 
tion can be quite greedy in computational resources specially 
if one wants to adopt a particular attitude for a given level of 
optimism, specially if the dimension of the frame S is large. 


2) The COWA-ER approach: Yager’s OWA approach is 
based on the choice of a given attitude measured by an 
optimistic index in [0,1] to get the weighting vector W;. What 
should be done in practice if we don’t know which attitude 
to adopt? An answer to this question has been proposed 
in Cautious OWA with Evidential Reasoning (COWA-ER) 
which exploits the results of the two extreme attitudes jointly 
(pessimistic and optimistic ones) to take a decision under 
uncertainty based on the imprecise valuation of alternatives. 
In COWA-ER, the pessimistic and optimistic OWA are used 
respectively to construct the intervals of expected payoffs 
for different alternatives. For example, for q alternatives, the 
expected payoffs are: 


E(Ci) [cr cP] 

E[C2] [op", CP] 
E{C| = . Cc 

E [C4] Ce, Ce 


Therefore, one has g sources of information before using 
the belief functions framework. Basically, the COWA-ER 
methodology requires four steps: 

e Step 1: normalization of imprecise values in [0, 1]; 

e Step 2: conversion of each normalized imprecise value into 
elementary bba m°(.); 

e Step 3: fusion of bba m'(.) with some combination rule 
(typically the PCR6 rule); 

e Step 4: choice of the final decision based on the resulting 
combined bba. 

With COWA-ER, we consider as O, the finite set of al- 
ternatives 0 = {Z1, Z2,...,Zq} and the sources of belief 
associated with them obtained from the normalized imprecise 
expected payoff vector E/"?|C;]. The modeling for comput- 
ing a bba associated to hypothesis Ff; from any imprecise value 
[a; b] C [0; 1] is done by: 


mi( Fi) =a, 


mi(Fi) =1—6, (5) 
m,(F; U F;) = m;(O) =b—- a, 


where F; is the F;’s complement in O. 

COWA-ER can help to take a decision if one wants on a 
group/subset of alternatives satisfying a min of credibility (or 
plausibility level) selected by the DM. It can also be extended 
directly for the fusion of several sources of informations when 
each source can provide a payoffs matrix. We can also discount 
each source easily if needed. 


3) The Fuzzy-COWA-ER approach: Unfortunately, COWA- 
ER has a serious limitation because the computational time 
depends on the number of alternatives. In COWA-ER, each 
expected interval is used as an information source, however, 
these expected intervals are jointly obtained and thus these 
information sources are relatively correlated. For these rea- 
sons, a modified version of COWA-ER, called Fuzzy-COWA- 
ER (or FCOWA-ER for short) has been developed in [15]. 
With FCOWA-ER, we consider the 2 columns of the expected 
payoff E[C;] as two information sources, representing pes- 
simistic and optimistic attitudes. The column-wise normalized 
expected payoff is: 


min max 
Ny 2 Ny 
min max 
Bruezy [C] NG 2 NG 


9 
min max 
Ne ails 


where N?™™ € [0,1] @ =1,...,q@ represents the normalized 
value in the column of pessimistic attitude, and N™** € [0, 1] 
represents the normalized value in the column of optimistic at- 
titude, ‘The vectors [Nj",.2.,.N2"| and (NP, <5 NP] 
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can be seen as two fuzzy membership functions (FMFs) 
representing the possibilities of all the alternatives F},...,Fy. 
The FCOWA-ER method requires also four steps: 
e Step 1: normalize each column in E/C}, respectively, to 
obtain E'™*74[C}; 
e Step 2: conversion of two normalized columns, i.e., two 
FMFs (Fuzzy Membership Functions) into two bbas m pess(.) 
and mopzi(.) using the a-cut approach introduced in [26]; 
e Step 3: fusion of bbas mpess(.) and mopri(.) with some 
combination rule (typically the PCR6 rule); 
e Step 4: choice of the final decision based on the resulting 
combined bba. 

In FCOWA-ER, only one combination step is needed. 
Furthermore, the bba’s obtained by using a-cuts are consonant 
support (nested in order). 


IV. APPLICATION TO PROBLEMS OF PROTECTION WORKS 
EFFECTIVENESS 


A. Assessment of structural effectiveness of a single check- 
dam through ER-MCDA 


1) AHP methodology: The problem consists in choosing 
the observed structural effectiveness level of a given E’. It is 
assessed through 6 criteria g; [27] (Figure 2). w; in Table I 
are defined by experts. 


gi: Rotation angle around the vertical axis 
gz: Slipping / Translation 


g;: Lateral support erosion through bypass 


gz: Rotation angle around the horizontal axis 


gs: Collapse 


Qe: Support erosion through scouring 


Figure 2. Hierarchical structure to assess observed structural effectiveness of 
E'. 


Observed external 
structural effectiveness 


@ = {HD1, HD2, HD3, HD4} 
HD, = Highly effective 
HD> = Moderatly effective 
HD; = Hardly effective 
HD, = No effective 


According to DST (Dempster-Shafer Theory) framework 
[10], © is composed of 4 exclusive elements of effectiveness 
levels: HD, = High’, HD2 = Medium’, H D3 = ’Low’ and 
HD, =’None’. 

2) Imprecise evaluation and mapping of gj: For E', we 
assume evaluations of g; by two experts (sources) s; and s2 
through possibility distributions (Table I). Through expert elic- 
itation, a set of fuzzy intervals L——R links each g; evaluation 
scale and ©. Quotations used are extracted from [28] (Table 
II). Using this mapping process, bbas are established in Table 
IH. 

To take into account the reliability of each source, we 
discount the input masses of Table III by applying the classical 
Shafer’s discounting method [10]. We use here discounting 
factors a,; = 0.7 and a,g = 0.5. We obtain 12 discounted 
bba’s (noted m‘, and m4) in Table IV. 


3) Two steps of fusion: The step 1 consists in combining 
the bbas m/(.) and m4(.) for each g; with PCR6 fusion rule 
(Table V). 

In step 2, we apply to each bba of Table V the importance 
discounting method presented in [29]. We use w; (Table I) to 
get the Table VI. After combining its 6 bba with a variant 
of PC'R6 to take into account positive masses on @, noted 
PCR6 9 and a normalization procedure [29], we finally get 
the Table VII. 

According to it, E!' is mainly medium effective because the 
highest belief mass is m(H D2). Because m(H D3), we can 
say that E' effectiveness is more between low and medium, 
but not high, nor none. 


B. Comparing actions on A; using (F)COWA-ER 


1) Decision problem elicitation: gj; is the D° effectiveness 
level. In a DST framework, one assumes 7 scenarii such as 
flood with bedload transport ($3) or debris flow (5%). One 
considers 5 possible actions such as repair of all the degraded 
check-dams (a3) or renewal of all check-dams (as). 

2) Cix and S° evaluations.: One rates Ci, with an integer 
between 0 (no effective) and 10 (very high effective) [7]. As 
Eq. (1), one assumes C' where g = 5 and n = 7 (Eq. (6)). 


5 3 423 11 
a a a 

C=|8 5 74 5 3 1 (6) 
100 7 106 7 5 2 
10 9 10 9 10 10 4 


One considers 4 X7: X; = SPUS§3U SS, Xo = S$U 
SPUSEUS8, X3 = S?, X, = O. One gives m(X,) = 0.6, 
m(X2) = 0.2, m(X3) = 0.01 and m(X4) = 0.19. Applying 
the OWA pessimistic and optimistic operators, one can assess 
the bounds of expected effectiveness levels for each actions 
given by Eq. (7). 


(7) 


(0.22; 0.46] 
(0.30; 0.64] 
E!™?1C] & | (0.38; 0.74] 
(0.56; 0.94] 
(0.87; 1.00] 


3) Results through COWA-ER: Steps 1 and 2 make it 
possible to assess bbas of the 5 actions in the Table VIII 
passing by the normalized imprecise matrix E/”?[C] given 
in Eq. (8). 

Step 3 combines the 5 bba’s altogether with choice of the 
PCR6 fusion rule (Table IX). 

Choosing the decision-making rule is needed to implement 
the step 4. Results are compared in the Table X. One sees that 
based on max of Bel, of BetP, of DSmP or of Pl, the best 
action is always Ds. 


(8) 
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Table I 


CRITERIA EVALUATIONS OF E!. 


Criterion gj wy 
. 0.1 


Unity 


Expert | (s1) 
“ 


Expert 2 (s2) 


ne II(E) xe 


Degree @ 


0 Meter (i) 


0.3 


Meter (m) m 
2. : S m . 5 


0.1 


0.1 Meter i ar ane 


: 
1 


Table II 


aii se 


a 
0.2 


MAPPING MODELS FOR EACH CRITERION. 


Table III 
E! BBA’S AFTER MAPPING PROCESS IN A DST FRAMEWORK. 


0.03125 

0.78125 0.9912 

0.1875 0.00037 
0 0 


0.005 


0.1875 
0.8125 
0 
0 


Table IV 


Criterion gj 


SHAFER’ S DISCOUNTING OF INPUT MASSES WITH RELIABILITY FACTORS 


Qsi = 0.7 AND ago = 0.5, 


Criterion gj | g1 92 93 G4 95 I6 
HD1 0.021875 0.0035 
HD2 0.546875 0.69384 
HD3 E 0.13125 0.00259 
HD4 ; 0 0 

(2) H ; E 5 0.3 0.3 
HD1 0.09375 
HD2 0.05625 0.40625 
HD3 0.44375 
HD4 

iS) 


4) Results through Fuzzy COWA-ER: 


Step 1 makes it 


possible to get from Eq. (7) a normalized imprecise matrix 


EFu229(C] in Eq. (9). 
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Table V 


mMstep1(-) 
0.57656 
0.0198 


0.85 0.85 0 


0 0.0674 


0.35445 0.03820 

0.41628 0.81046 

0.07927 ~—- 0.00131 
0 0 


0.15 0.15003 


Table VI 
E! BBAS AFTER IMPORTANCE DISCOUNTING. 


Ji g2 93 g4 95 96 


0.9 
0.085 


0.9 0.7 
0.085 0 


0.9 
0.03544 
0.02022 0.04163 0.24314 
0.20973. 0.02537 ~—- 0.00793 -~—:0.00039 
0.02505 0 0 0 

0.045 0.015 0.015 0.04501 


0.7 
0.01146 


0.015 


0.015 


Table VII 
E! BBAS AFTER THE STEP 2 OF PCR6-MCDA. 


mBapatieed(.) 


0 


0. 00547 
0.01560 


0.01141 
0.00017 
0.00134 


(0.26; 0.46 
(0.35; 0.64 
EF¥z2¥/C] x | (0.44;0.74 
(0.65; 0.94 
[1.00; 1.00 


] 


0.16094 
0.45901 
0.33559 
0.00494 
0.03951 


(9) 
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Table VHI 
BASIC BELIEF ASSIGNEMENTS OF THE 5 ACTIONS. 


Alternatives D; m,;(D,U Di) 


am,(D;) 


Dy 0.22 0.54 0.24 

De 0.30 0.36 0.34 

D3 0.38 0.26 0.36 

Da 0.56 0.06 0.38 

Ds 0.86 0 0.14 
Table IX 


FUSION OF THE 5 ELEMENTARY BBAS WITH PCR6. 


Focal element 


mpc re6(.) 


Di 0.02835 
Do 0.04805 
Ds 0.07318 
Da 0.15185 
Ds 0.39179 
Di, UD; 0.00019 
D2 U Ds 0.0004 
D3 U Ds 0.00059 
DaU Ds 0.00269 
D,U DaU Ds 0.0012 
Dz U D3 U Ds 0.00056 
Dz U Da U Ds 0.00254 
D3 U Ds U Ds 0.00372 
D1, U D2 U Ds 0.00018 
D,U D3 U Ds 0.00026 
D, U Dz U D3 U Ds 0.00138 
D,U DzUD4U Ds 0.02194 
D,U D3 UD4U Ds 0.04123 
Dz U D3UD4U Ds 0.09063 
D,UD2UD3U D4U Ds 0.13927 


Table X 
BEL, BETP, DSMP AND PL OF EFFECTIVENESS LEVELS OF ACTIONS ON 
Dj; BASED ON COWA-ER. 


D; | Bel(D;) | BetP(D;) | DSmPz20(Di) | Pl(D:) 
Dy 0.028 0.073 0.037 0.234 
Do 0.048 0.106 0.066 0.305 
Ds 0.073 0.136 0.103 0.351 
Da 0.152 0.222 0.221 0.455 
Ds 0.392 0.463 0.572 0.699 


For the step 2, by using a 5 a-cut approach, we convert 
EF™24(C| into 2 bbas mpess(.) and Mopti(.). Step 3 com- 
bines them with choice of the PCR6 fusion rule. Results are 
given in the Table XI. 


The Table XII shows the (approximate) values of Bel(.), 
BetP(.), DSmP.19-6(.) and Pl(.) based on mpcpre(.) 
values of Table XI. One sees that based on max of Bel, of 
BetP, of DSmP or of Pl, the best action is always Ds (similar 
decision as with COWA-ER). 
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Table XI 
THE 2 BBAS TO COMBINE AND THE RESULT OF PCR6 FUSION 


Focal Element mpcre(.) 


MOpti(.) 


MPess(.) 


Ds 0.35 0.06 0.3895 

D4U Ds 0.21 0.20 0.2847 

Dz U Da U Ds 0.09 0.10 0.1033 

D2U D3 UD4U Ds 0.09 0.18 0.1051 

D;,UD2U D3 U D4 U Ds 0.26 0.46 0.1174 
Table XII 


CREDIBILITY, BETP, DSMP AND PLAUSIBILITY OF EFFECTIVENESS 
LEVELS OF D; BASED ON FCOWA-ER. 


D; | Bel(D;) | BetP(D:i) | DSmP(D:) | PlDi) 
Di 0 0.023 0 0.117 
Do 0 0.050 0 0.222 
Ds 0 0.084 0 0.326 
Da 0 0.227 0 0.611 
Ds 0.389 0.616 1 l 


V. CONCLUSIONS AND PERSPECTIVES 


In this paper, we have both formalized the decision problem 
and applied recent advanced MCDMs (ER-MCDA, COWA- 
ER, FCOWA-ER) to assess effectiveness of torrent protective 
check-dams in a context of imperfect information and more 
or less reliable sources. This application, based on expert 
knowledge, provides a class evaluation related to available 
knowledge. Others outranking methods such as the Soft- 
Electre Tri (SET) methodology [30] can also be applied to 
sort protection systems in predefined effectiveness classes. 
Defining uncertain states of nature and corresponding belief 
mass m(.) remains challenging. Comparing belief functions 
theory with Bayesian probabilities or Choquet capacities in 
this actual context is a next step [31]. Effect on results of 
the fusion rules and order of combinations have also to be 
compared. 

From an operational point of view, next steps will consist 
in DM and decision problem complete elicitation, criteria, 
importance, preferences on evaluation scale assessments. Af- 
terwards, these methods will be combined in a global process 
taking into account all the system scales related to protection 
system devices. 
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Abstract—The main purpose of this paper is to apply, and to 
test the performance of the new method, based on belief functions, 
proposed by Dezert & all to evaluate the quality of individual 
association pairings provided in the optimal data association solu- 
tion for improving the performances of multisensor-multitarget 
tracking systems. The advantages of its implementation in an 
illustrative realistic surveillance context when some of association 
decisions are unreliable and doubtful and lead to potentially 
critical mistakes are discussed. A comparison with the results 
obtained on the base of Generalized Data Association is made. 


Keywords: data association, belief functions, PCR6 fusion 
rule, multitarget tracking. 


I. INTRODUCTION 


The problem of Data Association (DA) is a central in the 
modern multi-target tracking (MTT) systems’ design [1,2]. It 
relates to the process of associating uncertain measurements 
(observations) to known tracks, and it is conditioned and 
motivated by the most important function of each surveillance 
system — to keep and to improve target tracks maintenance 
performance. In the monosensor context it corresponds to 
proper sensor observations partitioning (at a given scan) to 
the predicted states of the targets in such a way that their 
tracks’ updates to be as precise, correct, and reasonable, as 
possible. 

There are several approaches developed to resolve correla- 
tion ambiguities and to select the best observation-track pair- 
ings, based on different models. Some of them establish reward 
matrix based on Kinematic only Data Association (KDA) and 
on a probabilistic framework [3,4]. Some of them rely on 
Belief Functions (BF) [5-9] and motivate the incorporation 
of the advanced concepts for Generalized Data Association 
(GDA) [6-8], allowing the introduction of target attribute 
(target type, radar cross section, etc.) into the association 
logic, in order to improve track maintenance performance in 
complicated situations (closely spaced/crossing targets), when 
kinematics data are insufficient for coherent decision making. 
The main peculiarity consists in applying Dezert-Smarandache 
theory (DSmT) of plausible and paradoxical reasoning [8] 
to model and to process the utilized attribute data. In most 
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common case, when surveillance system provides kinematic 
only information (such as range, azimuth, elevation), obtained 
during a given scan, the most common way of dealing is 
to solve the optimal DA solution and to use all solutions 
(pairings) to update tracks, even if some of the parings have 
poor quality. It could yields, in fact, to a bad/wrong track 
updating, and, as a result, the overall tracking performance 
could be degraded substantially. 

The most recent method proposed by Dezert & Benameur 
[10] to evaluate the Quality Assessment of Data Association 
(QADA) encountered in multiple target tracking applications 
in a mono-criterion context, and recently extended in [11] 
for the multi-criteria context deal just with the case above. 
It assumes that the rewards matrix is known and has been 
obtained by a method chosen by the user. It is based on belief 
functions for establishing the quality of pairings (interpreted as 
a confidence score) belonging to the optimal data assignment 
solution based on its consistency (stability) with respect to all 
the second best solutions, provided by a chosen algorithm. 

The main purpose of our paper is to serve as a preliminary 
study of MTT performance evaluation based on QADA-KDA 
approach, and to discuss its advantages in an illustrative 
multi-target tracking scenario. We will make also comparison 
between its performance and the results obtained on the base 
of GDA. The paper is organized as follows. Section II de- 
scribed the problem of DA in the multitarget tracking context. 
Section III provides the details about the new method [10] for 
quality assessment of optimal DA solution. In Section IV the 
simulation scenario and results are presented and discussed. 
The conclusion is given in Section V. 


II. DATA ASSOCIATION PROBLEM IN MULTITARGET 
TRACKING CONTEXT 


Data Association is very important, and the most decisive 
step in the multitarget tracking surveillance process. The DA 
problem consists in finding the global optimal assignment of 
the targets T; (2 = 1,,m) to some measurements z; (7 = 
1,...,”) at a given time & by maximizing the overall gain 
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in such a way that no more than one target is assigned to a 
measurement, and reciprocally. 

The so called m x n rewards (gain/payoff) matrix 
Q, = [w(2,7)] is defined with elements w(i, 7) > 0, represent- 
ing the gain of the association of target JT; with measurement 
zj;. These elements are usually homogeneous to the likeli- 
hood ratios. In some cases w(t, 7) > 0 represent normalized 
distances between measurement 7 and target 7, and in this 
case DA problem consists in finding the best assignment, 
minimizing the overall cost. 

The goal of the optimal assignment problem is to find mx n 
binary association matrix A = [a(i,7)], where: 


a 1, if measurement z; is assigned to track Tj, 
a(t, j) = 


0, otherwise. 
(1) 
The association matrix maximizes the global reward 
R(Q, A), given by: 
m n 
RQ, A) * SOS owl, fal, J). (2) 
i=1 j=1 

The importance of the assignment problem is quite clear 
and various successful solutions to its resolving already ex- 
ist. Among the well known are Kuhn-Munkres algorithm 
(known as Hungarian) [12,13] and its extension proposed by 
Bourgeois and Lassalle in [14] to rectangular matrices. More 
sophisticated Murty’s method [15] provides not only the first 
best assignment, but also the m-best assignments in order of 
increasing cost, as shown in examples of [10,11]. The best 
optimal assignment solution is not necessarily unique, as well 
the second best one. Usually in MTT algorithms the first best 
assignment solution is taken as a hard decision for association. 
But in some real practical cases of dense multi-target and 
cluttered environment, DA problem is difficult to solve, be- 
cause some of associations decisions a(i, j) are unreliable and 
doubtful, so they could lead to potentially critical mistakes. 
For example in case of incorrect determination of the incoming 
measurements for two tracks in such a way, that they are too 
close, the solution of the assignment problem, that is the core 
of GNN, is impossible to be sufficiently explicit. In such a 
case, it will be more cautious to not rely on all the pairings 
confirmed in the first best solution, but only on some of them 
which are enough trustable, according to the a priori defined 
threshold level. Utilizing the already obtained and available m- 
best assignments solutions, Dezert & al. [10,11] provide very 

efficient method for achieving this important knowledge. 


III. QUALITY ASSESSMENT OF OPTIMAL DA 


The first and the second best assignments matrices A, 
and A» are used [10], in order to establish the quality of 
the specific associations (pairings) satisfying the condition 
ai(i,j) = 1. The main idea behind QADA method is to 
compare the values a; (7,7) = 1 in A, with the corresponding 
ones ag(i,j) = 1 in Ag, and to identify the change (if 
any) of the optimal pairing (i, 7). In our MTT context, (7, 7) 
means that measurement z; is associated with target T;. A 


quality indicator is established, depending on both the stability 
of the pairing and its relative impact on the global reward. 
The proposed method works also when the Ist- and 2nd- 
best optimal assignment A, and A» are not unique, i.e. there 
are multiplicities available. The construction of the quality 
indicators is based on BF theory and Proportional Conflict 
Redistribution fusion rule no.6 (PCR6), defined within DSm 
theory [8]. It depends on the type of pairing matching in the 
way, described below: 


e In case, when a(t, 7) = a2(i,j) =0, one has a full 
agreement on “non-association” of the given pairing (i, 7) 
in A; and Ag. This “non-association” has no impact on 
the global reward values Ry(Q,A1) and R2(Q, Ag), so 
it will be useless to utilize it in DA. Hence, the quality 
indicator value is set to q(7,7) = 0. 


e In case, when aj(i,j) = ao(i,j) =1, one has a full 
agreement on “association” of the given pairing (i, 7) 
in A, and Ag. This “association” has different impacts 
on the global reward values Ry(Q,A;) and Ro(Q, Az). 
In order to estimate the quality of this matching as- 
sociation, one establishes two basic belief assignments 
(bba), m,(.) (s = 1,2) according to the both sources of 
information (A; and Ag). The frame of discernment, one 
reasons on, consists of a single hypothesis X = (Tj, z;): 
measurement z; belongs to track T; , and its negation 
X: measurement z; does not belong to track T;. The 
ignorance is modelled by the proposition X UX. 


(3) 


ms(X) = as(t, j)w(t, j)/Rs(Q, As), 
ms(X UX) =1—m,(X). 


Applying the conjunctive rule of combination denoted by 
my(.) ® Ma(.) one gets: 


my42(X) = m4(X )m2(X) = m4(X )m2(X U X) 


+m1(X UX)me(X), 


(4) 
The pignistic transformation [15] is applied in order 
to obtain the pignistic probabilities, built on the base 
of fused basic belief assignments, as BetP(X) = 
my42(X) + $my2(X U X) and BetP(X) = my2(X) + 
4my2(X UX). Then, one chooses the quality indicator 
about the association (Tj, z;) as q(i, 7) = BetP(X). 


e In case, when ay(7,7) = 1 and ag(i,j) =0, then a dis- 
agreement (conflict) on the association (Tj, z;) in A; and 
Ag is detected. One could find the association (Tj, 2;,) 
in Ag, where j2 is the measurement index, such that 
ag(i,j2) = 1. In order to define the quality of such 
conflicting association (Tj, z;), one establishes two basic 
belief assignments (bba), m,(.) (s = 1,2) according to 
the both sources of information (A; and Az). The frame 
of discernment, one reasons on, consists of the following 
two propositions: X = (Tj, z;), and Y = (Tj, z;,). The 


174 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


ignorance is modelled by the proposition X UY . Then 
one obtains: 


my(X) = ar(4, j)w(t, 7) /Ri(Q, Ar), (5) 
Hee = a2(i, fa)u(i,j2)/Pa(Q 42), 


Different rules of combination (Dempster-Shafer’s, 
Dubois-Prade’s, Yager’s [16] could be chosen to work 
with a normalized combined BBA. The method [10] 
recommends to use the Proportional Conflict Redistribu- 
tion rule no. 6 (PCR6), proposed originally in DSmT 
framework [8], because it has been proved very effi- 
cient in practice. With PCR6, the following fusion result 
mpcre(.) = mi(.) ® ma(.) is obtained: 


mpcre(X) = mi(X)mo(X UY) + mi(X) oma) 


: ay 
mpore(Y) = mi(X UY )ma(Y) + ma(Y) amy 
mpcro(X UY) =mi(X UY)me2(X UY). 


(7) 


The decision is taken on the base of the pignistic trans- 
formation: 


1 
BetP(X) = mpcr6(X)+ gimpcro(X UY), @) 


I 
BetP(Y) = mpcro(Y) + gmpcre(X U Y). (9) 


The quality indicators are chosen as: q(7, 7) = BetP(X) 


and q(i,j2) = BetP(Y). The absolute quality factor 
becomes: 
Qavs(A1, A2) = S> So ar(i, ali, j). (10) 
1=1 g=1 


IV. SIMULATION SCENARIO AND RESULTS 


The noise-free multitarget tracking simulation scenario 
(Fig.1) consists of three air targets moving in parallel from 
West to East with constant velocity of 100 m/sec and a distance 
between them 150 m. The stationary sensor is located at 
the origin. The sampling period is Tycan = 5 sec, and the 
measurement standard deviations are 0.5 deg and 65m for 
azimuth and range respectively. The surveillance of moving 
targets is performed during 15 scans. Figure 2 shows the 
respective noised scenario. 

The classical target tracking algorithm was run, consisting 
in two basic steps: (i) data association to associate proper 
measurements (distance, angle) with correct targets and (ii) 
track filtering to update the targets state vectors, once the op- 
timal assignment was found. In our simulation the Converted 


Measurement Kalman Filter [1] is used. 

In this work we will focus our attention on DA step, which 
is very important, and the most decisive one in the multitarget 
tracking. The Global Nearest Neighbour (GNN) [1] approach 
is used in order to make a decision for data association. One 
obtains the assignment matrix AMat(i,7),(@=1,...,mij = 
1,...,7) based on normalized distances between measurement 
j and target 7. In order to eliminate unlikely (kinematics- 
based) observation-to-track pairings, the classical validation 


i i 1 i i i aceai 
-4000 -3000 -2000 -1000 Xn] 1000 2000 3000 4000 


. Noise-free MTT scenario. 
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Fig. 2. Noised MTT scenario. 


test d?(i, 7) < y is carried on the Mahalanobis distance [1,2] 
d(i, 7) computed from the measurement z,;(k) at a given time 
moment k, and its prediction 2;(k|k — 1) by 


d” (i,j) © (2j(k) —2:(K|k—1))’S~*(k)(25(k) — 4:(K|k—1)). (11) 


Assuming given measurement vector’s size /, the quantity 
d?(i,j) could be interpreted as a sum of the squares of 
M independent Gaussian random variables with zero means 
and unit standard deviations. For that reason d?(i,7) have 
x3, distribution with M degrees of freedom and allowable 
probability of a valid observation falling outside the gate. In 
our case a probability of 1% is approved, then from the table 
of the chi-square distribution [2] one obtains the threshold 
y = 9.21. In fact, this value represents the biggest possible 
distance’s value associated with observation-to-track pairings. 
Based on this, one assumes that if 7-th measurement does not 
fall in the gate of target 7, then the value, associated with 
this pairing (7,7) in the assignment matrix could be set to be 
enough big (in our case equals to 100), in order to prepare the 
assignment matrix for the next step. The classical Munkres 
and Katta-Murty methods [15] are used in order to obtain the 
first and second best assignment solutions for measurement- 
to-track associations. By minimizing the sum of the chosen 
pairings’ distances, a binary association matrix A = [a(i, j)] 
is obtained. Figure 3 shows the typical MTT performance, 
based on classical GNN approach with Kinematic only DA 
(KDA), when one does not utilize additional procedures to 
improve the quality of DA. 

In case of noised measurements, it is evident that at scan 
number 9 tracks 2 and 3 change their directions, becoming 
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Fig. 3. Typical MTT performance with KDA. 


crossing, instead of following their parallel moving behavior. 
It is because of incorrect determination of the incoming 
measurements in such a way, that they are too close and the 
solution of the assignment problem, which is the core of GNN, 
is impossible to be categorical. The problem consists also in 
the proximity of the targets (inter-distance of 150 m), and in 
the bad sensor distance resolution of op = 65 m. It leads to 
wrong GNN association decision. 


A criterion for a minimal admissible measurements’ dis- 
tances is chosen here as din < 7p /2. During scan no. 9 one 
has: d23 < dmin- 


In such a critical case, one needs to utilize some additional 
information in order to avoid associations’ miscorrelations. 
Here the method in [10] is applied. The goal is to estimate 
the quality of questionable pairings (T>, z2) and (T3, z3). One 
obtains the corresponding reward matrix 2 = [w(2,7)] with 
elements w(i, 7) representing the gain of the associations of 
target T; (¢ = 1,...,m) with measurement z; (j = 1,..., 1). 
It is achieved as: w(t,7) = 10 — AMat(i,j). The reason 
for this expression relates to the already determined maximal 
normalized distance y = 9.21, according to the table of chi- 
square distribution. The data association deals with finding 
the global optimal assignment of the targets to some mea- 
surements by maximizing the overall gain in such a way that 
no more than one target is assigned to a measurement, and 
reciprocally. This is an equivalent measure for optimality, as 
is the global minimum of the distances. 


The algorithm, based on [10] was automatically applied 
during the scan no. 9, because the minimum distance between 
observations no. 2 and no. 3 is under the accepted limits 
d(2,3) = 15.83m < op/2. The quality matrix at scan no. 
9, containing the quality levels associated with the chosen 
pairings in the first best solution 


characterizing the set {(T, 21), (T2, 22), (T3, z3)} of associa- 


tions, with respect to the second best solution 


100 
A2=|0 0 1], 
0 1 0 


characterizing the set {(T), 21), (T2, 23), (T3, z2)} of associa- 
tions, is given in Table I below. 


TABLE I 
QUALITY MATRIX AT SCAN 9. 


Poosmack TTT 2 Ts 


0.773 | 0.000 | 0.000 
0.000 | 0.504 | 0.000 
0.000 | 0.000 | 0.498 


It is obvious that according to the first best assignment 
solution, one has: q(T), 21) = 0.773, ¢(To, z2) = 0.504, and 
q(T3, 23) = 0.498. We accept the admissible for a correct 
association quality threshold to be set to gr = 0.7. 

Based on the associations quality assessment (Table I), and 
the accepted quality threshold gr = 0.7, one could make the 
following decision: The only pairing, among those, chosen 
by Munkres algorithm in the first best assignment is (T), 21), 
because its quality level exceeds the accepted reasonable for 
correct association quality threshold q(T}, z1) = 0.773 > qr. 
Following the decision logic in [10], only (7), 21) pairing will 
be used in the updating process, while the second and third 
tracks will keep going under prediction mode while the next 
measurements will be available, because q(T, 22) = 0.504 < 
qr, and q(T3, 23) = 0.498 < qr. The performance of the MTT 
algorithm, based on the QADA-KDA is shown on Fig.4. It 
is obvious that the reasoned/informed decision taken at scan 
no.9, based on QADA-KDA method leads to miscorrelation 
conflict resolution. 


i if 1 
3000 2000 1000 1000 2000 3000 4000 


Fig. 4. MTT performance with QADA-KDA. 


In order to compare the obtained by QADA result, the 
simulation was made by applying GDA (Fig.5), when the 
target attribute (target type) is introduced into the association 
logic, in order to improve track maintenance performance in 
the same MTT scenario, with an additional assumption, that 
targets go from West to East in a group with the following 
type order {Fighter, Cargo, Fighter}. 

GDA-MTT improves the process of DA by utilizing target’s 
type decision (based on confusion matrix C' = [c;;]) coupled 


176 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


i ft i i i 
3000 2000 ~1000 rt 1000 2000 3000 000 
{m] 


Fig. 5. MTT performance with GDA. 


with the classical kinematic measurements. The way of con- 
structing the confusion matrix is based on some underlying 
decision-making process based on specific attribute features 
measurements. Its elements represent the probability of deci- 
sions Ty (ZT, = Fighter, T2 = Cargo) that the target type is 7 
when its real type is 7, more precisely 


cij = P(Ta = T;| True Target Type = %). 
In our simulation we have chosen the following confusion 


matrix 
C= 0.95 0.05 
~ 10.05 0.95} ° 


GDA is applied at each scan during the whole surveillance 
process, in order to prevent observation-to-track miscorrela- 
tions. 


V. CONCLUSION 


This work is a preliminary study of MTT performance 
evaluation based on the new Quality Assessment of optimal 
DA method, proposed by Dezert & al. It assures the stability 
of MTT performance and could be applied in all cases 
relating to the impossibility of DA to produce an association 
decision with a high quality. It might concern cases, when only 
kinematics measurements are available, as well the cases when 
attribute and kinematic data are both available, because QADA 
is totally independent of the applied logic to obtain the best 
DA solution. The work’s perspective concerns Monte Carlo 
based evaluation of different, more critical MTT scenarios in 
a multi-sensor context. 
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Abstract—The main objective of this paper is to present, to 
apply, and to test the effectiveness of the new method, based on 
belief functions, proposed by Dezert et al. in order to evaluate 
the quality of the individual association pairings provided in 
the classical optimal data association solution for improving 
the performances of multitarget tracking systems in clutter, 
when some of the association decisions given in the optimal 
assignment solution are unreliable and doubtful and lead to 
potentially critical mistake. This evaluation is based on a Monte 
Carlo simulation for particular difficult maneuvering and non- 
maneuvering MTT problems in clutter. A comparison with the 
results obtained on the base of Kinematic only Data Association 
and Generalized Data Association is made. 


Keywords: Data association, Belief Functions, PCR6 fusion 
rule, multitarget tracking. 


I. INTRODUCTION 


Data association (DA) is a fundamental and central problem 
in up-to-date multitarget tracking (MTT) systems [1]-[2]. 
It entails selecting the most trustable associations between 
uncertain sensor’s measurements and existing targets at a given 
time. In the presence of dense MTT environment, with false 
alarms and sensors detection probability less than unity, the 
problem of DA becomes more complex, because it should 
contend with many possibilities of pairings, some of which are 
in practice very doubtful, unreliable, and could lead to critical 
association mistakes in overall tracking process. To avoid such 
cases, sometimes it is better to wait for a new measurements 
during the next scan, instead of taking a hard DA decision, 
which actually is not always unique. 

Several methods have been devised over the years, in 
order to resolve properly DA problem. They are originating 
from different models. Some rely on the established reward 
matrix based on Kinematic only Data Association (KDA) 
and on a probabilistic framework [3]-[4]. Some other stud- 
ies are based on Belief Functions (BF) [5]-[9], motivating 
the incorporation of the advanced concepts for Generalized 
Data Association (GDA) [6]-[8], where a particular target’s 
attribute is introduced into the association logic in order to 
compensate the complicated cluttered cases, when kinematics 
data are insufficient for adequate decision making. Dezert- 
Smarandache Theory (DSmT) of plausible and paradoxical 
reasoning [8] is used to model and to process the utilized 


attribute data. Although interesting and approved, all these 
methods currently developed are limited to the following 
aspect - all of them solve the optimal DA problem and use 
all optimal observations-to-tracks pairings, selected in the first 
best DA solution to update tracks, even if some of them have 
poor quality. In consequence the overall tracking performance 
could be degraded substantially. In order to deal with this case 
the most recent method to evaluate the Quality Assessment 
of Data Association (QADA) encountered in multiple target 
tracking applications in a mono-criterion context is proposed 
by Dezert and Benameur [10]. It is extended in [11] for 
the multi-criteria context. This novel method assumes the 
reward matrix is known, regardless of the manner in which 
it is obtained by the user. It is based on BF for achieving 
the quality of pairings (interpreted as a confidence score) 
belonging to the optimal data assignment solution based on 
its consistency (stability) with respect to all the second best 
solutions, provided by a chosen algorithm. 


This paper is an extension of our preliminary study on the 
effect of applying QADA method in MTT presented in [17]. 
The main purpose of our paper is to assess the efficiency of 
QADA method in a critical, conflicting MTT situation. The 
evaluation is based on a Monte Carlo simulation for particular 
difficult maneuvering and non-maneuvering MTT problems 
in clutter. The QADA based MTT performance is compared 
with the results, obtained for KDA and GDA based MTT, 
concerning the same scenarios. The paper is organised as 
follows. In order to achieve a good readability of the paper, 
we recall in section IJ the data association problem within 
the MTT context, and in a section II the details of the new 
method, proposed by Dezert et al. [10] for quality assessment 
of pairings, chosen in the optimal DA solution. In section IV 
we discuss and propose the way in which Kalman filtering 
could be affected in order to reflect the knowledge we have 
obtained on the base of QADA method. Two simulation MTT 
scenarios (with non-maneuvering and maneuvering targets) are 
presented and the results, obtained on the base of QADA-, 
KDA-, and GDA based MTT are discussed. Conclusions are 
made in Section VI. 
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II. DATA ASSOCIATION PROBLEM IN MTT CONTEXT 


The DA problem consists in finding the global optimal 
assignments of targets T;,2 = 1,...,7m to some measurements 
Zj,j =1,..,n at a given time k by maximizing the overall 
gain in such a way, that no more than one target is assigned 
to a measurement, and reciprocally. 

The m x n reward (gain/painoff) matrix Q = [w(i,7)] is 
defined by its elements w(i,7) > 0, representing the gain of 
the association of target T; with the measurement z;. These 
values are usually homogeneous to the likelihood ratios. In our 
case w(t, 7) represents the normalized distances between the 
measurement Z; and target T; : d?(i,7) = (z;(k) — (klk — 
1))'S~1(k)(z;(k) — @:(k|k — 1)) < y computed from the 
measurement z;(k) and its prediction z;(k|k—1) computed by 
the tracker of target 2 (see [2] for details), and the inverse of 
the covariance matrix S(k) of the innovation computed by the 
tracking filter. In this case the DA problem consists in finding 
the best assignment, minimizing the overall cost. 

The optimal DA problem consists in finding the m x n 
binary association matrix A = [a(i,7)] with a(i,7) € {0,1}, 
maximizing the global reward R(Q, A), given by: 

R(M, A) 27> wi, al, 9). (1) 


i=1 j=1 


If a(z,7) = 1, it means that one has an association between 
target T; and measurement z;. The association indicator value 
a(i, 7) = 0 means that they are not associated. 


(2) 


is 1, if z; is associated to track T;, 
a(t = 
os 0 otherwise. 


The importance of the assignment problem is quite clear 
and various successful solutions to its solving already ex- 
ist. Among the well known are Kuhn-Munkres algorithm 
(known as Hungarian) [12]-[13], and its extension proposed by 
Bourgeois and Lassalle in [14] to rectangular matrices. More 
sophisticated Murty’s method [15] provides not only the first 
best assignment, but also the m-best assignments in order of 
increasing cost, as it was shown in the examples in [10]-[11]. 
The best optimal assignment solution is not necessarily unique, 
as well as the second best one. Usually in MTT algorithms the 
first best assignment solution is taken as a hard decision for 
association. But in some real practical cases of dense multi- 
target and cluttered environment, DA problem is difficult to 
solve, because some of the associations decisions a(i, 7) are 
unreliable, so they could lead to potential mistakes. 

For example, in case of incorrect determination of the 
incoming measurements for two tracks in such a way, that they 
are too close, the solution of the assignment problem, that is 
the core of the Global Nearest Neighbour (GNN) approach, 
is impossible to be sufficiently explicit. In such a case, it will 
be more cautious not to rely on all the pairings confirmed in 
the first best solution, no matter than only some of them are 
trustable enough. Utilizing the already obtained and available 
m-best assignments solutions, Dezert et al. [10], [11] provided 
an appealing method for taking into account this knowledge. 


II. QUALITY ASSESSMENT OF PAIRINGS IN DA 


In order to establish the quality of particular associations, 
associated with the optimal assignment matrix Aj, and satis- 
fying the condition a;(7,7) = 1, QADA method proposes to 
utilize both, first and second assignment solutions A, and Ag. 
For a self-containing purpose, this section recalls briefly the 
principle of QADA that has been already detailed in [10], [11] 
with a tracking application in [17]. 

The main idea behind it is to compare the values a; (i, 7) 
in A; with the corresponding values a2(i,7) in Ag, and to 
identify if there is a change of the optimal pairing (7,7). In 
our MTT context (7,7) means an association between mea- 
surement z; and target T;. One establishes a quality indicator 
associated with this pairing, depending on the stability of 
the pairing and also, on its relative impact in the global 
reward. The proposed method works also when the 1° and 
gna optimal assignments A, and Az are not unique, i.e., there 
are multiplicities available. The construction of the quality 
indicator is based on BF theory and Proportional Conflict 
Redistribution Rule no.6 (PCR6), defined within DSmT [8]. It 
depends on the type of the pairing matching, as it is described 
below: 


e If ai(t,7) = a2(i, 7) = 0, one has a full agreement on the 
hypothesis ’non-association’ of the given pairing (Tj, z;) 
in A, and Ag. This ’non-association’ has no impact 
on the global reward values Ri (Q, A1) and Ro(Q, Az), 
therefore it will be useless to utilize it in DA. Hence, 
in this case, the quality indicator will be set to zero, 
q(i,j) = 0. 

e If ai(t,7) = ao(t,7) = 1, one has a full agreement on 
the hypothesis ’association’ of the pairing (Z;,z;) in Ay 
and A». This ’association’ (T;,z;) has different impacts 
on the global reward values Ry(Q,A1) and R2(Q, Ag). 
In order to estimate the quality of this matching pairing, 
one establishes two Basic Belief Assignments (BBAs), 
ms(.), 8 = 1,2, according to both sources of information 
(1°* and 2”¢ optimal assignments matrices A; and A»). 
The frame of discernment consists of a single hypothesis 
X = (T;,z;) : measurement z,; belongs to the track T;. 
The ignorance is modelled by the proposition X U X, 
where X is the negation of hypothesis X: 


ms(X) = ay (i, j).w(t, J) /Ri(Q, Ar), (3) 
m,(X UX) =1-—m,(X). 

Applying the conjunctive rule of combination [8] (Vol. 

1), one gets: 


m42(X) = m4(X )m2(X) + m41(X)m2(X U X) 
+m (X U X)m2(X), 
(4) 
The pignistic transformation [16] is applied in order 
to obtain pignistic probabilities, built on the base of 
combined belief assignments, such as: BetP(X) = 


my42(X) + 3.m12(X UX) and BetP(X) = d.my2(X U 
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X). Then one chooses the quality indicator, associated 
with the pairing (7,7), as q(t,7) = BetP(X). 

e Ifai(i,7) = 1 and a2(i, 7) = 0, then a conflict is encoun- 
tered on the association (T;,z;) in A; and A». Then one 
could find the association (Tj, z;,) in Ag, where jo is 
the index, such that a2(i, j2) = 1. In order to define the 
quality of such conflicting association, one establishes 
two BBAs, mg(.),s = 1,2 according to both sources 
of information (A; and Ag). The frame of discernment 
consists of two propositions: 0 = {X = (Jj,z;),Y = 
(Tj, Zj.)}, and the BBAs are defined by [10]. 


mi(X) = ar(i, 7): ees. (5) 
m4(X U Y) =1- m,(X), 
ma(¥) = aa(i, Ja) > Rte, é 
Applying PCR6 fusion rule [8] (Vol. 3), one gets: 
m(X) = m4(X).m2(X UY) + mi(X)- mt Peat 
m(X UY) = mi(X UY)me2(X UY). 
(7) 


Applying again the pignistic transformation, one gets 
BetP(X) = m(X) + $.m(X UY) and BetP(Y) = 
m(Y) + 4.m(X UY). Hence, the quality indica- 
tors here are chosen as: q(i,7) BetP(X) and 
q(i,j2) = BetP(Y). The absolute quality factor be- 
comes: Qabs(A,A2) = Yojn1 Doja1 a(t, 9)-a(%, J): 
Once obtained, this quality matrix Q = {q(i,j)], 7 = 
1,...,m; j = 1,...,n, where the elements qg(i,7) € 
[0, 1] define the quality of particular associations, chosen 
in the optimal assignment matrix Aj. It will be utilized 
in the next step of the classical MTT algorithm - Kalman 
filtering (KF). 


IV. KALMAN FILTERING INFLUENCED BY QADA METHOD 


The classical target tracking algorithm was run, consisting 
of two basic steps: (i) data association to associate the proper 
measurements (distance, angle) with correct targets and (11) 
track filtering to update the targets state vectors, once the 
optimal assignment is found. In our simulation the Global 
Nearest Neighbour (GNN) [1] approach is applied in order 
to make a decision for data associations. GNN approach is 
a DA method that provides an assignment matrix for quality 
assessment of data association. 

The Converted Measurement Kalman Filter (CMKF) is used 
for track filtering. We will not recall it in details, which 
can be found in many standard textbooks [1]-[2], but will 
make an impact on the manner, in which the obtained quality 
assessment of pairings in the optimal assignment solution 
influences the target’s state updating. 

In order to derive KF equations, the goal is to find an 
equation computing an a posteriori state estimate x(k+1|k+1) 
at time (k +1) as a linear combination of an a priori estimate 
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x(k + 1|k), and a weighted difference between the true 
measurement z(k + 1) and a measurement prediction: 


R(k+ 1k +1) =R(k+ 1k) + W(k+1)2(k +1) (8) 


The difference z(k + 1) = 2(k +1) — H&(k + 1k), called a 
measurement innovation (or residual), reflects the discrepancy 
between the predicted measurement z2(k + 1|k) = H(k + 
1)x(k + 1|k) and the true one z(k + 1), where H(k + 1) is 
the so-called observation matrix. If z(& + 1) is equal to zero, 
it means, that both, the true measurement and predicted one 
are in full agreement, which is the perfect case. The matrix 
W(k + 1) is the filter’s gain matrix obtained by minimizing 
the a posteriori estimate error covariance. It is given by 
the following formulae, where R is the measurement error 
covariance, and P(k + 1|k) is the predicted covariance matrix 
of the state estimate error: 


W(k+1) = P(k+ 1k)H7(k +1)8-1(k +1) 
= P(k + 1|k)H7(k + 1) 
.[H(k + 1)P(k + 1|k)H2(k +1) +R]. (10) 


(9) 


From Eqs. (8) and (10) one could conclude, that the value 
of measurement error covariance R influences the gain’s value 
W(k+1), and respectively the state estimate in the way below: 

e If the measurement error covariance R — O, the true 

measurement z(k + 1) is trusted more, and in the same 
time predicted measurement Hx(k + 1|k) is trusted less. 

e If the measurement error covariance R increases, the true 

measurement z(/+1) is trusted less, and in the same time 
predicted measurement Hx(k + 1|k) is trusted more. 


Let’s now recall again what kind of information one obtains, 
having in hand the quality matrix, derived by QADA method 
[10]. It gives us a knowledge about the confidence q(i,j) in 
all pairings (Tj,z;),i = 1,..,m:j7 = 1,..,n, chosen in the 
first best assignment solution. The smaller quality (confidence) 
of hypothesis “z; belongs to T;” means, that the particular 
measurement error covariance R was increased and one should 
not trust fully in the actual (true) measurement z(k + 1). 

Having this conclusion in mind, in this work we propose, 
such a behaviour of the measurement error covariance to be 
modelled by R = Wea) for every pairing, chosen in the 
first best assignment and on the base of corresponding quality 
value obtained. Then, Kalman filter gain decreases, and as a 
result, the true measurement z;(k + 1) is trusted less in the 
updated state estimate x(k + 1|k +1). 

The MTT algorithm tested in this paper is based on the 
classical one (using Kalman Filters based on kinematics mea- 
surements) because we are only concerned with impact QADA 
on the performances of such type of tracking filters for now. 
Our aim is not to compare this QADA-MTT to other more 
sophisticate MTT algorithms', but we believe that QADA 
approach could also be useful for improving performances of 
more sophisticate MTT algorithms as well. This is left for 
future research works. 


'In fact, we will just compare QADA-MTT to KDA-MTT and GDA-MTT 
based on CMKF in Section V. 
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V. SIMULATION SCENARIOS AND RESULTS 


Two simulation MTT scenarios - non-maneuvering and 
maneuvering are presented and the results, obtained on the 
base of QADA-, KDA-, and GDA based MTT are discussed. 


A. Maneuvering targets simulation scenario 


The simulation scenario (Fig. 1) consists of three air targets 
with two classes. The stationary sensor is located at the origin. 
The sampling period is T’-an = 5sec and the measurement 
standard deviations are 0.4 deg and 25m for azimuth and 
range respectively. The targets go from West to East with the 
following type order CFC (C=Cargo, F=Fighter) with constant 
velocity 100m/sec. At the beginning the targets move from 
different directions. The first target moves from North-West 
with heading 120 degrees from North. At scanno. = 8 
the target performs a maneuver until scanno. = 15 with 
transversal acceleration +1.495m/s? and settles towards East, 
moving in parallel according to X axis. The second target 
moves during the whole scenario in parallel according to 
X from West to East without maneuvering. The third target 
at the beginning moves from South-West with heading 60 
degrees from North. At scan no. = 8 the target performs a 
maneuver until scan no. = 15 with transversal acceleration 
—1.495m/s2 and settles towards East, moving in parallel 
according to X axis. The inter-distance between the targets 
during scans 15th - 18th (the parallel segment) is approxi- 
mately 150m. At scan no. = 18 to scanno. = 25 the first 
and the third targets make new maneuvers. The first one is 
directed to North-East and the second - to South-East. The 
process noise standard deviations for the two nested models 
for constant velocity IMM (Interacting Multiple Models) filter 
[1], [3] are 0.1m/s? and 7m/s? respectively. The number of 
false alarms (FA) follows a Poisson distribution and FA are 
uniformly distributed in the surveillance region. 


x10" Noise-free Scenario 
1 1 ; 


9 02 04 06 08 1 
X{m] 


Figure 1. Noise-free maneuvering MTT Scenario. 


Fig. 2 shows the respective noised scenario. 

GDA-MTT [6], [7] improves DA process by utilizing 
target’s type decision based on the confusion matrix C = [C;,] 
coupled with the classical kinematic measurements, where 
Ciy = P(Ta = T;/TrueTargetT ype = T;) represents the 
probability of decisions Ty = (T, & Fighter, Tz & Cargo), 
that the target type is 7 when its real type is 7. In our 

0.95 0.05 


simulation C = 0.05 0.95|" 


x1o* Noised Scenario 


A fi f f a f i i i 
A 0.8 —0.6 0.4 0.2 0 02 04 0.6 08 1 
X[m] x10" 


Figure 2. Noised maneuvering MTT Scenario. 


Monte Carlo (MC) simulations for the considered MTT 
scenario are made for 200 MC runs, applying KDA, QADA, 
and GDA. Our goal is to evaluate, show, and to discuss 
the effect of Quality Assessment of Optimal Assignment for 
Data Association on the overall target tracking performance 
in comparison to results, obtained for the same scenario, 
by Kinematic only Data Association, and Generalized Data 
Association based MTT. We use an idealized track initiation 
in order to prevent uncontrolled impact of this stage on the 
statistical parameters of the tracking process during Monte 
Carlo tests of the new developed algorithm. The true targets 
positions (known in our simulations) for the first two scans 
are used for tracks initiation. 

The evaluation of MTT performance is based on the criteria 
of tracks’ purity, tracks’ life, and percentage of miscorrelation. 
Track’s purity criteria examines the ratio between the number 
of particular performed (jth observation - 7th track) associ- 
ations (in case of detected target) over the total number of 
all possible associations during the tracking scenario. Track’s 
life is evaluated as an average number of scans before track’s 
deletion. In our simulations, a track is cancelled and deleted 
from the list of tracked tracks, when during 3 consecutive 
scans it cannot be updated with some measurement because 
there is no validated measurement in the validation gate. We 
call this, the “cancelling/deletion condition”. The status of the 
tracked tracks is denoted “alive”. 

The percentage of miscorrelation examines the relative 
number of incorrect (observation-to-track) associations during 
the scans. 

The results for less noised case (with 0.2 FA in average in 
the filter validation gate) are given in Table 1. 


Table I 
MANEUVERING SCENARIO: COMPARISON BETWEEN KDA, QADA, GDA 
BASED MTT PERFORMANCES FOR fA = 0.2. 


PRATT ODM ODA 
Average Track Life [% 86.65 92.82 91.06 
Fverage Miscorrtation [7 


Track Purity [% T144 88.20 85.74 


QADA-MTT exceeds KDA-MTT according to average track 
life and track purity, and shows better performance concerning 
the encountered average track life in comparison to GDA- 
MTT. Figure 3 shows the most informative knowledge - a 
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percentage of miscorrelations, encountered during the consec- 
utive scans. One could see, that QADA-MTT shows almost 
two times better performance in comparison to KDA-MTT, 
and is close to GDA-MTT performance. 


¥. 
25 aed 
Rod 
20 ie 5 
e: *% 
S) EB *, 
Bs : * Wee 
se % 
10 * S % 
tig Ry 
*: ae 3 
5 Oye HP ae + 
‘ de re ran Es 
Rietaes SR ed 
M 10 15 20 35 30 
scans 
Figure 3. Maneuvering scenario: Average miscorrelations in KDA-MTT, 


QADA-MTT, GDA-MTT for noised case F'A = 0.2 


The respective results for the most noised case (with 0.4 
FA in average in the filter validation gate) are given in Table 
2 below. 


Table II 
MANEUVERING SCENARIO: COMPARISON BETWEEN KDA, QADA, GDA 
BASED MTT PERFORMANCES FOR F’A = 0.4 


The figures 5 and 6 show typical performances of QADA- 
MTT and KDA-MTT systems. 


x10! Maneuvering scenario: Typical QADA based MTT performance 


X [m] x10 


Figure 5. Maneuvering scenario: Typical performance of QADA based MTT. 
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0 
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LRAT] QADA-MTT | GDA-MITT 
Average Track Life [% 74.27 86.61 86.52 


Figure 6. Maneuvering scenario: Typical performance of KDA based MTT. 


Average Miscorrelation [% 10.58 
Track Purity [% 60.42 77.96 79.35 


As a whole, the results for F.A = 0.4 are deteriorated in 
comparison to the less noised case, but still QADA-MTT 
shows stably better performance with respect to KDA-MTT 
performance. The average track life keeps a little bit higher 
than in GDA-MTT case. 

The Fig.4, showing the percentage of miscorrelations in 
more difficult noised case, confirms that QADA-MTT over- 
comes KDA-MTT performance. 


Average Miscorrelations in KDA-MTT, QADA-MTT, GDA-MTT 
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Figure 4. Maneuvering scenario: Average miscorrelations in KDA-MTT, 
QADA-MTT, GDA-MTT for noised case F'A = 0.4 


The figures 7 and 8 show the averaged filtered errors along 
X (designated by asterisk) and Y (designated by circles) axes, 
and the distance error associated with the maneuvering track 
1 in the considered scenario. 


Maneuvering Scenario:Filtered Errors along X,Y-track 1: KDA-MTT, QADA-MTT, GDA-MTT 
160- : 
* 


<fe X-error KDA-MTT 
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error KDA-MTT : ae a 2 
y-error QADA-MTT : > 
100} 0: Y-error GDA-MTT * 


Figure 7. Filtered errors along X,Y for maneuvering track 1 - KDA-MTT, 
QADA-MTT, GDA-MTT. 


For the maneuvering target 1, the errors, along X axis, 
obtained by using QADA-MTT, are definitely smaller than 
those, encountered with KDA-MTT. The errors along Y are 
a little bit bigger than respective errors along X, but as a 
whole the distance error, encountered by using QADA-MTT 
are smaller than in KDA-MTT. MC errors are evaluated on the 
base of the averaged errors associated with all “alive” tracks. 
Some of the errors occurred (for example in Fig.7 and Fig.8) 
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Distance Errors — maneuvering track 1 for KDA-MTT, QADA-MTT, GDA-MTT B. Non-maneuve ring ta rgets simulation scena rio 


180 


ote KDA-MTT * 
soe QADA-MTT 
oe GDA-MTT 


160+ 


140b 
FA=0.2 


120+ 


[m] 


‘e, Js & 
20h Sete tee tea. FARR te Ke 


Ol tee L 1 L L J 
0 5 10 15 20 25 30 


Figure 8. Maneuvering scenario: Distance errors for maneuvering track | - 
KDA-MTT, QADA-MTT, GDA-MTT. 


could be explained by the unrealized canceling of tracks at the 
end of the scenario, when some tracks go toward canceling, but 
cannot satisfy the canceling condition because of lack of time. 
As a result they are not cancelled (and not deleted) leading 
that way to the increasing error. 

Figures 9 and 10 show the behaviour of the same errors, 
but now associated with the near-by non-maneuvering target 
2. 


Maneuvering Scenario:Filtered Errors along X,Y-track 1: KDA-MTT, QADA-MTT, GDA-MTT 
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Figure 9. Maneuvering scenario: Filtered errors along X,Y for non- 
maneuvering track 2 - KDA-MTT, QADA-MTT, GDA-MTT. 
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Figure 10. Maneuvering scenario: Distance errors for non-maneuvering track 
2 - KDA-MTT, QADA-MTT, GDA-MTT. 


For the non maneuvering target 2, the filtered errors 
along X and Y axes, obtained by using QADA-MTT, are 
smooth and definitely smaller then those, encountered with 
KDA-MTT. As a consequence, the associated with QADA- 
MTT distance error is smaller than in KDA- and GDA-MTT. 
The errors are calculated on the base only of the “alive” tracks. 


The noise-free non-maneuvering targets simulation scenario 
(see Fig.11) consists of three air targets moving in parallel 
from West to East with the type order CFC (C=Cargo, 
F=Fighter) with constant velocity of 100m/sec and a 
distance between them 150m. The stationary sensor is located 
at the origin. The sampling period is Tycan = 5sec, and the 
measurement standard deviations are 0.5 deg and 65m for 
azimuth and range respectively. The surveillance of moving 
targets is performed during 15 scans. The confusion matrix, 


utilized by GDA is C = fe ee Fig. 12 shows the 
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Figure 11. Noise-free non-maneuvering MTT Scenario. 
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Figure 12. Noised non-maneuvering MTT Scenario. 


As reported in Table 3, QADA-MTT shows again almost 
2 times better performance, in comparison to KDA-MTT, 
according to the average miscorrelations, and also better 
performance regarding the average track life and track purity. 


Table II 
NON-MANEUVERING SCENARIO: COMPARISON BETWEEN KDA, QADA, 
GDA BASED MTT PERFORMANCES FOR F'A = 0.2. 


RATT ODM GOAT 
Average Track Life [% 89.79 94.21 97.59 
[9-82 


Average Miscorrelation [% 21.36 10.77 5.82 
Track Purity [% 64.46 81.72 90.15 
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Fig.13 shows the percentage of miscorrelations in less 
noised case (with 0.2 FA in average per gate). 


Average Miscorrelation in KDA-MTT, QADA-MTT, GDA-MTT 
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Figure 13. Non-maneuvering scenario: Average miscorrelations in KDA- 


MTT, QADA-MTT, GDA-MTT. 


The same QADA-MTT behaviour is valid in the more dense 
cluttered environment with 0.4 FA in average per gate (see 
table 4 and fig. 14). 


Table IV 
NON-MANEUVERING SCENARIO: COMPARISON BETWEEN KDA, QADA, 
GDA BASED MTT PERFORMANCES FOR F'A = 0.4. 


or ar ot 
Average Track Life [%] 92.18 96.77 
- 
0 


Average Miscorrelation 20.69 12.15 
Track Purity [% 65.46 77.38 


Average Miscorrelation in KDA-MTT, QADA-MTT, GDA-MTT 
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Figure 14. Non-maneuvering scenario: Average miscorrelations in KDA- 
MTT, QADA-MTT, GDA-MTT. 


The figures 15 and 16 show typical performances of QADA- 
MTT and KDA-MTT systems. 

The figures 17—20 show the encountered filtered errors along 
X and Y axes and the distance errors, associated with the 
intermediate track 2 for both noised cases (when the number 
of FA per gate is 0.2 and 0.4). 

One observes (for example in Fig.9 and Fig.17) that er- 
rors associated with this simpler (non-maneuvering) scenario 
sometimes appear to be greater than in the previous more 
complicated (maneuvering) one. It is because the sensor’s 
errors are defined deliberately greater in the non-maneuvering 
scenario. It provokes a complex situations, where the impact 
of QADA method is better demonstrated. 


Non-maneuvering scenario: Typical QADA based MTT performance 
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Figure 15. Non-maneuvering scenario: Typical performance of QADA based 
MTT. 


Non-maneuverig scenario: Typical KDA based MTT performance 
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Figure 16. Non-maneuvering scenario: Typical performance of KDA based 
MTT. 
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Figure 17. Non-maneuvering scenario: Filtered errors along X,Y for track 2 
- KDA-MTT, QADA-MTT, GDA-MTT. 


Nonmaneuvering scenario: Filtered Errors along X, Y — track 2: KDA-MTT, QADA-MTT, GDA-MT1 
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Figure 18. Non-maneuvering scenario: Filtered errors along X,Y for track 2 
- KDA-MTT, QADA-MTT, GDA-MTT. 


185 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Nonmaneuvering scenario: Distance Errors — track 2 for KDA-MTT, QADA-MTT, GDA-MTT 
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Figure 19. Non-maneuvering scenario: Distance errors for non-maneuvering 
track 2 - KDA-MTT, QADA-MTT, GDA-MTT. 
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Figure 20. Non-maneuvering scenario: Distance errors for non-maneuvering 
track 2 - KDA-MTT, QADA-MTT, GDA-MTT. 


VI. CONCLUSIONS 


This work assesses the efficiency of MTT performance in 
cluttered conflicting situations, based on the recent QADA 
method. The QADA based MTT performance is compared 
with the results, obtained for KDA and GDA based MTT, 
concerning two (maneuvering and non-maneuvering targets) 
scenarios. Our Monte Carlo simulation results show that 
QADA-MTT performs better than KDA-MTT for all measures 
of performances in all scenarios under low or heavy clutter 
conditions with target detection probabilities less than one, 
which is the main result of this paper. 

Concerning the comparison of performances of QADA- 
MTT (using kinematics measurements only) with respect to 
GDA-MTT, we observe that the performances of GDA-MTT 
are slightly better than those of QADA-MTT. This conclu- 
sion is not very surprising because GDA-MTT uses more 
information (kinematics and attributes) than KDA-MTT or 
QADA-MTT (which are based on kinematics measurements 
only). Therefore, the ability of GDA-MTT to provide better 
tracking performances is what we naturally expect. However, 
we must emphasize that QADA method could also be used to 
improve GDA-MTT as well in a similar manner as it has been 
used to improve the performances of KDA-MTT. This possible 
improvement of GDA-MTT with QADA is under investigation 
and will be reported in a forthcoming publication. 

Taking in mind, that MTT problems as a general do not 
able to utilize additional target attribute information, (i.e. when 
only kinematic measurements are available), applying QADA 


instead of KDA leads to better MTT performance, because 
of its ability to estimate the quality of the individual pairings 
given in the optimal assignment solution. QADA is totally 
independent of the applied logic to obtain the best DA solution. 
Hence, it could be applied successfully in all cases when 
attribute or/and kinematic data are available. 
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Abstract—This paper presents a performance evaluation of two 
types of multi-target tracking algorithms: 1) classical Kalman 
Filter based algorithms for multi-target tracking improved with 
Quality Assessment of Data Association (QADA) method using 
optimal data association, and 2) the Joint Probabilistic Data 
Association Filter (JPDAF). QADA technique is improved by 
using new basic belief assignment (bba) modelling, and also 
modified by means of the new Belief interval distance applied 
for computing the quality indicator associated with the pairings 
in the optimal data association solution. The evaluation is based 
on Monte Carlo simulations for maneuvering multiple-target 
tracking (MTT) problem in clutter. 


Keywords: Data association, JPDAF, Belief Functions, PCR6 
fusion rule, QADA, Multitarget Tracking. 


I. INTRODUCTION 


The main function of each radar surveillance system is to 
keep targets tracks maintenance. It becomes a crucial and 
challenging problem especially in complicated situations of 
closely spaced or crossing targets. The main objective of 
multiple-target tracking (MTT) is to estimate jointly, at each 
observation time moment, the number of targets continuously 
moving in a given region and their trajectories from the noisy 
sensor data. 

Data Association (DA) is a central problem in MTT sys- 
tems’ design [1], [2]. It relates to the process of associating 
uncertain measurements (observations) to the tracked tracks. 
In the presence of a dense MTT environment, with false alarms 
and sensor detection probabilities less than unity, the problem 
of DA becomes more complex, because it should contend 
with many possibilities of pairings, some of which are in 
practice very imprecise, unreliable, and could lead to critical 
association mistakes in the overall tracking process. 

In order to deal with these complex associations the most 
recent method to evaluate the Quality Assessment of Data 
Association (QADA) encountered in multiple target tracking 
applications in a mono-criterion context was proposed by Dez- 
ert and Benameur [4], and extended in [5] for the multicriteria 
context. It is based on belief functions (BF) for achieving the 
quality of pairings belonging to the optimal data assignment 
solution based on its consistency with respect to all the second 
best solutions, provided by a chosen algorithm. Recently, in 


[6], [19] the authors discussed and proposed the way in which 
Kalman filter (KF) could be enhanced in order to reflect 
the knowledge obtained based on the QADA method, called 
QADA-KF method. QADA assumes that the reward matrix is 
known, regardless of the manner in which it is obtained by 
the user. In this paper QADA method is improved by using 
new BBA modelling, and also modified by means of the new 
Belief Interval distance (BId) [18] applied for computing the 
quality indicator associated with the pairing in the optimal 
DA solution. The results are compared with those obtained 
by using Pignistic Probabilities [16]. We propose and test 
the performance of two versions of QADA-KF. The first one 
utilizes the assignment matrix, provided by the Global Nearest 
Neighbor (GNN) method, called QADA-GNN KF approach. 
The second one utilizes the assignment matrix, provided by the 
Probabilistic Data Association (PDA) method, called QADA- 
PDA KF method. These two QADA-KF methods are com- 
pared with the well-known Joint Probabilistic Data Association 
Filter (JPDAF) [7], [8], [9] which is an extension of the 
Probabilistic Data Association Filter (PDAF) [1] to a fixed 
and known number of targets. JPDAF uses joint association 
events and joint association probabilities in order to avoid 
conflicting measurement-to-track assignments by making a 
soft (probabilistic) assignment of all validated measurements 
to multiple targets. 

The main objective of this paper is to: (1) improve QADA 
method by using new bba modelling; (2) modify the improved 
QADA method by means of the new Belief interval distance 
for computing the quality indicator; (3) compare the perfor- 
mances of: (a) classical MTT algorithms based on the GNN 
approach for data association, utilizing Kinematic only Data 
(KDA) based MTT; (b) QADA-GNN KF based MTT; (c) 
QADA-PDA KF based MTT; (d) JPDAF based MTT. 

The evaluation is based on a Monte Carlo simulation for 
particular difficult maneuvering MTT problem in clutter. This 
paper is organized as follows. Section II is devoted to the 
improved QADA method. Section III discusses the Kalman 
Filter improved by QADA. The two variants of the assignment 
matrix, utilized by QADA are discussed in Section IV. In 
Section V the JPDAF is described and discussed. A particular 
simulation MTT scenario and results are presented for the 
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KDA, QADA-GNN KF, QADA-PDA KF, and JPDAF in 
Section VI. Conclusions are made in Section VII. 


II. THE IMPROVED QUALITY ASSESSMENT OF OPTIMAL 
DATA ASSOCIATION 


A. Improvement of QADA bba modelling 


DA is a decisive step in MTT systems [1], [2]. It con- 
sists in finding the global optimal assignments of targets T;, 
2 = 1,...,m to some measurements z;, 7 = 1,...,n ata 
given time & by maximizing the overall gain in such a way, 
that no more than one target is assigned to a measurement, 
and reciprocally. The m x n reward matrix Q = [w(i,7)] is 
defined by its elements w(i,7) > 0, representing the gain of 
the association of target T; with the measurement z;. 


The first and the second best assignments matrices A; and 
Az are used [4], in order to establish the quality of the specific 
associations (pairings) satisfying the condition a,(i,7) = 1. 
The main idea behind QADA method is to compare the values 
ai(i,7) in A; with the corresponding ones a2(i,j) in Ag, 
and to identify the change (if any) of the optimal pairing 
(i, 7). In our MTT context, (i, 7) means that measurement z; 
is associated with target T;. A quality indicator is established, 
depending on both the stability of the pairing and its relative 
impact on the global reward. The proposed method works 
also when the |-st and 2-nd best optimal assignment A, and 
Az and are not unique, i.e. there are multiplicities available. 
The construction of the quality indicators is based on Belief 
Functions (BF) theory and Proportional Conflict Redistribution 
fusion rule no. 6 (PCR6), defined within DSm theory [16]. It 
depends on the type of pairing matching in the way, described 
below: 


e In case, when a;(i,7) = a2(t,7) = 0, one has a full 
agreement on “non-association” of the given pairing (i, /) 
in A, and Ag». This “non-association” has no impact on 
the global reward values Ri(Q,A;) and R2(Q, Az), so 
it will be useless to utilize it in DA. Hence, the quality 
indicator value is set to q(i,7) = 1. 

e In case, when aj(i,7) = a2(t,7) = 1, one has a full 
agreement on “association” of the given pairing (7,7) in 
A, and Ag. This “association” has different impacts on 
the global reward values R,(Q, Ai) and Ro(M, Az). In 
order to estimate the quality of this matching association, 
one establish two basic belief assignments (BBA), ™<(-) 
(s = 1,2) according to the both sources of information 
(A; and A»). The Frame of Discernment (FoD), one 
reasons on, consists of a single hypothesis X = (Tj, z;): 
measurement z; belongs to track T;. The ignorance is 
modeled by the proposition X UX, where X is a negation 
of hypothesis X. The BBA m,(-) is defined by 


Hie = as(i, j) -w(i, j)/Rs(Q, As), 
ms(X UX) =1—m,(X). 
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Applying the conjunctive rule of combination of m(X) 
with m2(X) we get 


m42(X) = m4(X)m2(X) + my(X)m2(X U XxX) 
+M4 (X U X)mz2 (X), 

(1) 
In our previous works [6], [17], [19], we did propose to 
use the pignistic transform BetP to establish the quality 
indicator. 
In case, when a,(i,j) = 1 and ag(t,7) = 0, then a dis- 
agreement (conflict) on the association (Tj, z;) in Aj 
and A» is detected. One could find the association 
(Tj, Zj,) in Ag , where jo is the measurement index, 
such that a2(i,j2) = 1. In order to define the quality 
of such conflicting association (Tj, z;), one establishes 
two basic belief assignments (BBA), m,(-) (s = 1,2) 
according to the both sources of information ( A; and 
A»). The FoD, one reasons on, consists of the following 
two propositions: X = (T;,z;), and Y = (Tj, z;,). The 
ignorance is modeled by the proposition X U Y. In our 
previous works [4], we did define the BBAs by: 


mi(X) = ar(%, J) -w(t,J)/Ri(Q, Ar), (2) 
m4(X U Y) =l1- m,(X), 
fies? = d2(i, j2) - w(t, j2)/Ro(Q, Ad), (3) 


This modeling in fact does not work efficiently in 
some cases and that is why we need to revise it to 
make the QADA approach working more efficiently. 
For example, let’s consider only one target T’ and two 
validated measurements z; and z2 with the following 
payoff matrix © = [100 1]. The two possible associ- 
ations are represented by A, = [1 0] providing a 
reward R,(Q,A1) = 100, and Ay = [0 1] providing a 
reward R2(Q,A2) =1 . In this simple case, one has 
0 = {X =(T, 21), Y = (T, z2)}. In applying formulas 
(2)-(4), one gets 


m,(X) = ai(t, 7) -w(i,7)/Ri(Q, Ar) — 1 x tbo = 1 
m(X UY) =1—m)(X) =0 


(4) 
m2(Y) = a2(i, j2) : w(t, j2)/R2(Q, A) = Ix + = 1, 
(5) 


The conjunctive combination rule gives: 
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Applying PCR6 fusion rule [16] (Vol.3): 


mECFS (9) a 0, 

mi FS (X) = mie(X) + rm (Xam pom =>) 
mie) = maa(¥) + ma) amtayen Oy = 
mix *6(X UY) = mio(X UY) =0, 


(6) 
which yields (using the Pignistic transformation) 
BetP(X)=0.5 and BetP(Y) =0.5. This result is 
counter-intuitive (not realistic) because in this very simple 
case one knows that (7,21) is obviously the best data 
association solution. To circumvent this serious problem, 
we propose to modify the bba modeling by taking a new 
model of bba construction as follows: 
—_ os wi, j) 
m1(X) = ar(i, 7) - FC EVEN FI(CPEVE (7) 
my(X U Y) =l1- m,(X), 
= ae w(i,J2) 
m2(Y) = ag(t, j2) : PGA) tRAS)? (8) 
If we apply this modeling on the previous example, we 
obtain 
mi(X) = ai(i,7) : soe = ix — ~y 0.99, 
mi(X U Y) =1- mi(X) = 0.01, 


m2(Y) = a2(t, j2) : ee =1~x — ] 0.01, 
mo(X U Y) =1=— m2(Y) = 0.99. 


Hence, one gets now 


m42(Y) = m1(X UY)me(Y) = 0.0001, 


™m12 (0) =™My1 (X)me 
Applying PCR6 redistribution principle, one gets finally 


— 


Y) = 0.0099. 


mix @°(X) = 0.9801 + (0.99 x 0.0099) /1 = 0.989901, 
mix °(Y) = 0.0001 + (0.01 x 0.0099) /1 = 0.010099, 


which yields BetP(X) = 0.989901 and 
BetP(Y) =0.010099. This result fits now perfectly 
with what we expect, that is X = (7,21) is obviously 
the best data association solution. 


B. Improvement of quality indicator calculating by using 
Belief Interval (BI) distance 


In [11], [20] the Euclidean belief interval distance between 
two bbas mj,(-) and mg(-) is defined on the powerset of a 
given O = {6),...,n} as follows 


dpr(m1,™m2) = Ne: > di, (Bh(X), Blo(X)), (9) 


XE2° 
where N.=1/2!°!-! is a normalization factor to 
have dgrs(mi,m2) € [0,1], and dyw(Bh(X), Blo(X)) 
is the Wasserstein’s distance [22] between belief 


intervals  BI(X) = [Bely(X),Pli(X)] =[a1,bi] — and 
BIp(X) = [Belg(X), Pla(X)] = [az, ba]. More specifically, 


dw ([a1, 61], [a2, b2]) = 
a, + by a2 +be va 
2 2 


In [20], we have proved that dpr(,y) is a true distance 
metric because it satisfies the properties of non-negativity 
(dpr(x,y) > 0), non-degeneracy (dgi(z,y) =0 Sx = y), 
symmetry (dg7r(x,y) = dpr(y,x)), and the triangle inequality 
dpr(,y) + dpr(y,z) > der(a, z)), for any bba x, y and z 
defined on 2°. The choice of Wasserstein’s distance in dg 
definition is justified by the fact that Wasserstein’s distance is 
a true distance metric and it fits well with our needs because 
we have to compute a distance between [Bel,(X), Pli(X)] 
and [Belg(X), Pl2(X)]. 

For notation convenience, we denote x the categorical bba 
having only X as focal element, where X #4 () is an element 
of the powerset of ©. More precisely, mx is the particular 
(categorical) bba defined by mx(X) = 1 and mx(Y) = 0 
for any Y # X. Such basic bba plays an important role in our 
new decision scheme because its corresponding belief interval 
reduces to the degenerate interval [1,1] which represents the 
certainty on X. The basic principle of the new decision scheme 
we propose is very simple and intuitively makes sense. It 
consists in selecting as the final decision (denoted by X) the 
element of the powerset for which the belief interval distance 
between the bba m(-) and mx, X € 2° \ {0} is the smallest 
one. Therefore, take as the final decision X given by 


1 
3 


2 2 


a bz — ag (10) 


X= arg min 
X€E2°\ {0} 


dgr(m,mx). (11) 
where dgr(m,mx) is computed according to (9). m(-) is the 
bba under test, and mx the categorical bba focused on X 
defined above. 

This decision scheme is very general in the sense that the 
decision making can be done on any type of element of 
the power-set 2°, and not necessarily only on the elements 
(singletons) of the FoD. This method not only provides the 
final decision X to make, but also it evaluates how good this 
decision is with respect to its alternatives if we define the 


quality indicator g(X) as follows 
d BI (m, m x ) 


(ja 
Dx E2°\ {0} dgr (m, mx) 


(12) 


One sees that the quality indicator g(X) of the decision 
X will become maximum (equal to one) when the distance 
between the bba m(-) and mx is zero, which means that the 
bba m(-) is focused in fact only on the element X. The higher 
q(X) is, the more confident in the decision X we should be. 

Of course, if a decision must be made with some extra 
constraint defined by a (or several) condition(s), denoted c(X), 
then we must take into account c(X) in (11), that is 

X =arg 


min dpr(m,mx). 
XE2°\{O}| c(X) is true 
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and also in the derivation of quality indicator by taking 
> x€2°\ {9} e(X) is true dg1r(m,mx) as denominator in (12). 
Theoretically any other strict distance metric, for instance 
Jousselme’s distance [23], [24], could be used instead of 
dpr(.,.). We have chosen dg; distance because of its ability 
to provide good and reasonable behavior [20] as will be 
shown. When there exists a tie between multiple decisions 
{x j:J > 1}, then the prudent decision corresponding to their 
disjunction X = U;X; should be preferred (if allowed), 
otherwise the final decision X is made by a random selection 
among the elements of {X;,j > 1}. 


III. QADA BASED KALMAN FILTER 


The aim of this paper is to compare the performance of the 
JPDAF based MTT algorithm with the classical MTT algo- 
rithm, using the CMKF based on kinematics measurements, 
but improved by the QADA method. 

In [6], the authors discuss and propose the way in which 
Kalman filter (KF) could be improved in order to reflect the 
knowledge obtained based on the QADA method. 

Let’s briefly recall what kind of information is obtained, 
having in hand the quality matrix, derived by QADA, in the 
MTT context. It gives knowledge about the confidence q(?, 7) 
in all pairings (Tj, z;), 7 = 1,...,m; j = 1,...,n, chosen 
in the first best assignment solution. The smaller quality 
(confidence) of hypothesis “z; belongs to T;” means, that the 
particular measurement error covariance R was increased and 
the filter should not trust fully in the actual (true) measurement 
z2(k+1). 

Having this conclusion in mind, the authors propose, such a 
behavior of the measurement error covariance to be modeled 
by R=R/q(Ti,z;), for every pairing, chosen in the first 
best assignment and based on the corresponding quality value 
obtained. Then, when the Kalman filter gain decreases the 
true measurement z;(k + 1) is trusted less in the updated state 
estimate @(k + 1|k +1). 


IV. BUILDING ASSIGNMENT MATRIX FOR QADA 


QADA assumes the reward matrix is known, regardless 
of the manner in which it is obtained by the user. In this 
paper we propose two versions of QADA-KF. The first one 
utilizes the assignment matrix built from the single normalized 
distances, provided by the Global Nearest Neighbor method, 
called QADA-GNN KF method. The second one utilizes 
the assignment matrix, built from the posterior association 
probabilities, provided by the Probabilistic Data Association 
(PDA) method, called QADA-PDA KF method. 


A. Assignment matrix based on GNN method 


The GNN method finds and propagates the single most 
likely hypothesis during each scan to update KF. It is a 
hard (i.e., binary) decision approach, as compared to the 
JPDAF which is a soft (i.e., probabilistic) decision approach 
using all validated measurements with their probabilities of 
association. GNN method was applied in [6] and [17] to 
obtain the assignment matrix, utilized in QADA. In this case 


the elements of assignment matrix w(i,7) @@ = 1,...,m, 
j = 1,...,n) represent the normalized distances d(i,j) = 
[(2j(k) — 2e(klk — 1) S-*(K)(zj() — 2:(k|& — 1)? be- 
tween the validated measurement z;(k) and target T; satisfying 
the condition d?(i,7) < y. The distance d(i,7) is computed 
from the measurement z,;(k) and its prediction 2;(k|k — 1 
(see [1] for details), and the inverse of the covariance matrix 
S (k) of the innovation, computed by the tracking filter. The 
threshold yy, for which the probability of given observation to 
fall in the gate is 0.99, could be defined from the table of 
the Chi-square distribution with 7 degrees of freedom and 
allowable probability of a valid observation falling outside the 
gate. In this case the DA problem consists in finding the best 
assignment that minimizes the overall cost. 


B. Assignment matrix based on PDA method 


The Probabilistic Data Association (PDA) method [1] calcu- 
lates the association probabilities for validated measurements 
at a current time moment to the target of interest. PDA 
assumes the following hypotheses according to each validated 
measurement: 


e H,(k): z; is a measurement, originated from the target 
T; of interest, i = 1,...,m; 

e Ho(k): no one of the validated measurement originated 
from the target of interest. 


If N observations fall within the gate of track 7, N +1 hypothe- 
ses will be formed. The probability of Ho is proportional to 
pio = A 4(1— Pa), and the probability of H; (j =1,..., N) 
is proportional to 


2 
dij 


1 GP Rae 
y= : 
(20) M/? . \/|Sig| 


where P, is the a priori probability that the correct measure- 
ment is in the validation gate [1]; Py is the target detection 
probability; A is the spatial density of false alarms (FA). 
The probabilities p;; can be rewritten as [1] 


(13) 


b — . 
= a = for 7 = 0 (no valid obs.), (14) 
bro, on for 1 < J < N, 
where 
b% (1— Py Pi)Ara(2n)™/? . 4/|Sij|, (15) 
and 
a2, 
Og = Pye. (16) 


The assignment matrix used in QADA method is established 
from all p;; given by (14) related with all association hypothe- 
ses. This matrix will have m rows (where m is the number of 
all targets of interest), and N + 1 columns for the hypotheses 
generated. The (IV + 1)-th column will include the values pio 
associated with Ho(k). 
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V. JOINT PROBABILISTIC DATA ASSOCIATION FILTER 


The Joint Probabilistic Data Association Filter (JPDAF) 
is an extension of the Probabilistic Data Association Filter 
(PDAF) for tracking multiple targets in clutter [1], [2], [10], 
[11], [12]. This Bayesian tracking filter uses the probabilistic 
assignment of all validated measurements belonging to the 
target gate to update its estimate. The preliminary version 
of JPDAF was proposed by Bar-Shalom in 1974 [13], then 
updated and finalized in [7], [8], [9]. The assumptions of 
JPDAF are the following: 


e the number Nr of established targets in clutter is known; 
e all the information available from the measurements Z* 
up to time k is summarized by the sufficient statistic 
£' (k) (the approximate conditional mean), and covariance 
P*(k|k) for each target t, ¢=1,..., Nr; 

the real state x‘(k) of a target t at time k is modeled by 
a Gaussian pdf N(2'(k); #°(k), P*(k|k)); 

e each target t follows its own dynamic model; 

e each target generates at most one measurement at each 
observation time and there are no merged measurements; 
each target is detected with some known detection prob- 
ability P}; 

the false alarms (FA) are uniformly distributed in surveil- 
lance area and their number follows a Poisson pmf with 
FA density Ara. 


In JPDAF, the measurement to target association probabil- 
ities are computed jointly across the targets and only for the 
latest set of measurements. This appealing theoretical approach 
however can give rise to very high combinatorics complexity 
if there are several persistent interferences, typically when 
several targets are crossing or if they move closely during 
several consecutive scans. Moreover, some track coalescence 
effects may also appear which degrades substantially the 
JPDAF performances as it will shown in section VI. These 
limitations of JPDAF have already been reported in [14]. Let’s 
consider a cluster (a cluster is a group of targets which have 
some measurements in common in their validation gates, i.e. 
non-empty intersections) of 7’ > 2 targets t = 1,...,T7. 
The set of mm; measurements available at scan k is denoted 
Z(k) = {zi(k),i =1,...,m,}. Each measurement z;(k) of 
Z(k) either originates from a target or from a FA. De- 
note 2'(k|k — 1) as the predicted measurement for target t, 
and all the possible innovations that could be used in the 
Kalman Filter to update the target state estimate are denoted 
z(k) = 2z;(k) — 2¢(k|k — 1). In JPDAF, instead of using a 
particular innovation 2/(k), it uses the weighted innovation 
2(k) = SOPs Bi(k)Zi(k), where {%(k) is the probability 
that the measurement z; originates from target t. 3)(k) is the 
probability that none measurements originate from the target 
t. The core of JPDAF is the computation of the a posteriori 
association probabilities S{(k), (@ = 0,1,...,m) based 
on all possible joint association events O(k) = N24, 04" (k), 
where ©''(k) is the event that measurement z;(k) origi- 
nates from target t; (by convention and notation convenience, 
t; = O means that the origin of measurement z; is a FA.), 
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t; < Nr. More precisely, one has to compute for 
15. 45%, BER) = eck) P(@(k)|Z")&4(O(k)) and 
Bi(k) =1— ST" Bi(k), where Z* is the set of all measure- 
ments available up to time k, and ®:(O(k)) are the corre- 
sponding components of the association matrix characterizing 
the possible joint association O(k). 

JPDAF is well theoretically founded and it does not require 
high memory. It provides pretty good results on simple MTT 
scenarios (with non-persisting interferences) with moderate FA 
densities. However the number of feasible joint association 
matrices increases exponentially with problem dimensions 
(m, and Nr) which makes the JPDAF intractable for complex 
dense MTT scenarios. For more details about JPDAF, please 


refer to [1], [2], [10]-[12], and [15]. 


O< 
—— 


VI. SIMULATION RESULTS 


The Converted Measurement KF (CMKF) is used in our 
MTT algorithm. We assume constant velocity target model. 
The process noise covariance matrix is: Q = o2Q7, where 
T is the sampling period, o, is the standard deviation of the 
process noise, and Qr is as given in [3]. Here are the results 
of KDA KF, QADA-GNN KF, QADA-PDA KF, and JPDAF 
for the MTT scenario with maneuvering targets. 

The noise-free group of targets simulation scenario (Fig.1) 
consists of four air targets moving from left to right (or from 
West to East). For the clear explanation of the results, targets 
are numbered starting at the beginning with Ist target that has 
the greater y-coordinate and continuing to 4th target with the 
smallest y-coordinate. The stationary sensor is located at the 
origin with range 10000 m. The sampling period is Tcan = 5 
sec, and the measurement standard deviations are 0.2 deg and 
40 m for azimuth and range respectively. The targets move 
with constant velocity V = 100 m/s. The first target for the 
first 8 scans moves without maneuvering keeping azimuth 120 
deg from North. The group of two targets in the middle i.e. 
2nd and 3rd move without maneuvering keeping azimuth 90 
deg from North that means, horizontally from West to East. It 
is the main direction of the group movement. The 4th target 
starts with azimuth 60 deg and moves towards the middle 
group of rectilinearly moving targets. When it approaches the 
group, it starts a turn to the right with 30 deg. Its initial 
azimuth of 120 deg is decreased by the angle of turn and 
becomes 90 deg, i.e. coincides with the main direction. From 
15th scan, the four targets move rectilinearly in parallel. The 
distance between them is 150 m. The absolute value of the 
corresponding transversal acceleration for the two maneuvers 
is 1.495 m/s?. The total number of scans for the simulations 
is 30. The figure 2 shows the noised scenario for yielding to 
0.15 FA per gate on average. 

Our results are based on Monte Carlo (MC) simulations with 
200 independent runs in applying KDA based KF, QADAGNN 
KF, QADA-PDA KF, and JPDAF. We compare the perfor- 
mance of these methods with different criteria, and we use 
an idealized track initiation in order to prevent uncontrolled 
impact of this stage on the statistical parameters of the tracking 
process during MC simulations. 
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Fig. 1. Noise-free group of targets Scenario. 
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Fig. 2. Noised group of targets Scenario. 


The true targets positions (known in our simulations) for the 
first two scans are used for track initiation. The evaluation of 
MTT performance is based on the criteria of Track Purity (TP), 
Track Life (TL), and percentage of miscorrelation (pMC): 


1) TP criteria examines the ratio between the number of 
particular performed (jth observation - ith track) associ- 
ations (in case of detected target) over the total number 
of all possible associations during the tracking scenario, 
but TP cannot be used with JPDAF because JPDAF 
is a soft assignment method. Instead of TP, we define 
the Probabilistic Purity Index (PPI). It considers the 
measurement that has the highest association probability 
computed by the JPDAF and check, (and count) if this 


measurement originated from the target or not. PPI 
measures the ability of JPDAF to commit the highest 
probability to the correct target measurement in the soft 
assignment of all validated measurements. 

2) TL is evaluated as an average number of scans before 
track’s deletion. In our simulations, a track is canceled 
and deleted from the list of tracked tracks, when during 
3 consecutive scans it cannot be updated with some 
measurement because there is no validated measurement 
in the validation gate. When using JPDAF, the track 
is canceled and deleted from the list of tracked tracks, 
when during 3 consecutive scans its own measurement 
does not fall in its gate. We call this, the “cancel- 
ing/deletion condition”. The status of the tracked tracks 
is denoted “alive”. 

3) pMC examines the relative number of incorrect 
observation-to-track associations during the scans. 
The MTT performance results for KDA only KF, 
QADAGNN KF, QADA-PDA KF, and JPDAF for aver- 
age false alarms in gate F.A = 0.15 are given in Table 1. 
The MTT performance for QADA-PDA KF and QADA- 
GNN KF are estimated for both: Pignistic probabilities, 
and minimum Belief distance principles to compute the 
quality indicator. 


TABLE I 
GROUP OF TARGETS SCENARIO: COMPARISON BETWEEN MTT 
PERFORMANCE RESULTS FOR 0.15 FA PER GATE. 


JPDAF 


(in %) QADA-PDA QADA-GNN 
BetP Bld BetP Bld 
L 


88.12 
2.67 
84.54 


89.39 
2.45 
86.14 


84.31 
3.28 
79.86 


89.13 
2.39 
85.92 


78.42 70.02 
5.71 


61.95 


32.96 (PPI) 


According to all criteria, QADA-PDA KF method shows the 
best performance, followed by QADA-GNN KF, and JPDAF. 
The KDA based KF approach, as one could expect, shows the 
worst performance. It is obvious that minimum Belief distance 
interval principle for computing the quality indicator leads to 
improved MTT performance (compared to the results based on 
Pignistic probabilities - BetP) for both QADA-PDA KF and 
QADA-GNN KF. Still QADA-PDA KF outperforms QADA- 
GNN KF based MTT. 

In order to make a fair comparison between QADA KF 
and JPDAF, we will discuss also the root mean square errors 
(RMSE), associated with the filtered X and Y values, presented 
in Figs. 3-6. The results for QADA-GNN KF and QADA-PDA 
KF are obtained on the base of the improved QADA method 
using minimal Belief Interval distance criteria and with the 
new bba modeling, proposed in the paper. 

Figs. 3 and 4 show the mean square X and Y error filtered, 
associated with target 1, and compared for KDA KF, QADA- 
GNN KF, QADA-PDA KF, and JPDAF. Figs. 5 and 6 consider 
the same errors for the middle track 3. All the results are 
compared to the sensor’s errors along X and Y axis. 

As a whole, one could see that rms errors, associated with 
QADA-PDA KF and QADA-GNN KF are a little bit less 
than the sensor’s measurement errors, except around the scan 


192 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


15th, where all the targets move in parallel. We see that the 
RMSE on Y filtered error for track 1 associated with KDA- 
JPDAF grows extremely after scan 12. This behavior could 
be explained by the fact, that from this scan on target | starts 
moving in parallel with the rest of targets, causing that way 
spatial persisting interferences and track coalescence effects 
in JPDAF. These effects degrade significantly the quality of 
JPDAF performance as already reported in [14]. The same 
effect of track coalescence could be observed for track 3, 
moving in parallel during all the scans. The RMSE on Y 
filtered associated with JPDAF performance is high during 
the whole tracking region. 


rms X-filtered error for track 1 
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Fig. 3. RMSE on X for track 1 with the four tracking methods. 
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Fig. 4. RMSE on Y for track | with the four tracking methods. 


VII. CONCLUSIONS 


This work evaluated with Monte Carlo simulations the 
efficiency of MTT performance in cluttered environment of 
four methods (a) classical MTT algorithm based on GNN 
approach for data association, utilizing Kinematic only Data; 
(b) QADA-GNN KF; (c) QADA-PDA KF; and (d) JPDAF. 
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Fig. 5. RMSE on X for track 3 with the four tracking methods. 
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Fig. 6. RMSE on Y for track 3 with the four tracking methods. 


QADA technique was improved by using new BBA modelling. 
It is also was modified by means of the new Belief interval 
distance applied for computing the quality indicator associated 
with the pairings in the optimal DA solution. The results were 
compared with those obtained by using Pignistic Probabilities. 
It was proved that this new approach leads to better MTT 
performance. The implemented groups of targets scenario 
show the advantages of applying QADA-KF. According to 
all performance criteria, the QADA-PDA KF gives the best 
performance, followed by QADA-GNN KF, and JPDAF. The 
KDA KF approach shows the worst performance (as expected). 
This scenario is particularly difficult for JPDAF because of 
several closely spaced and rectilinearly moving targets in 
clutter during many consecutive scans, and it leads to track 
coalescence effects due to persisting interferences. As a result, 
the tracking performance of JPDAF is degraded. Because the 
complexity of the calculation for joint association probabilities 
grows exponentially with the number of targets, JPDAF re- 
quires almost 3 times more computational time in comparison 
to other methods in the first (complex) scenario. 
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Abstract—This paper presents a comparative analysis of per- 
formances of two types of multi-target tracking algorithms: 1) 
the Joint Probabilistic Data Association Filter (JPDAF), and 
2) classical Kalman Filter based algorithms for multi-target 
tracking improved with Quality Assessment of Data Association 
(QADA) method using optimal data association. The evaluation 
is based on Monte Carlo simulations for difficult maneuvering 
multiple-target tracking (MTT) problems in clutter. 


Keywords: Data association, JPDAF, Belief Functions, 
QADA, PCR6 rule, Multitarget Tracking. 


I. INTRODUCTION 


Multiple-target tracking (MTT) is a principle component 
of surveillance systems. The main objective of MTT is to 
estimate jointly, at each observation time moment, the number 
of targets continuously moving in a given region and their 
trajectories from the noisy sensor data. In a single-sensor case, 
the multitarget tracker receives a random number of measure- 
ments due to the uncertainty which results in low detection and 
false alarms, arising independently of the targets of interest. 
Because of the fact that detection probability is not perfect, 
some targets may go undetected at some sampling intervals. 
Additional complications appear, apart from the process and 
measurement noises, associated with a measurement origin 
uncertainty, missed detection, cancelling (death) of targets, etc. 

Data association (DA) is a primary task of modern MTT sys- 
tems [1]—[3]. It entails selecting the most trustable associations 
between uncertain sensor’s measurements and existing targets 
at a given time. In the presence of a dense MTT environment, 
with false alarms and sensor detection probabilities less than 
unity, the problem of DA becomes more complex, because it 
should contend with many possibilities of pairings, some of 
which are in practice very imprecise, unreliable, and could lead 
to critical association mistakes in the overall tracking process. 

In order to deal with these complex associations the most 
recent method to evaluate the Quality Assessment of Data 
Association (QADA) encountered in multiple target tracking 
applications in a mono-criterion context was proposed by 
Dezert and Benameur [4], and extended in [5] for the multi- 
criteria context. It is based on belief functions (BF) for 
achieving the quality of pairings belonging to the optimal data 
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assignment solution based on its consistency with respect to 
all the second best solutions, provided by a chosen algorithm. 
Most recently, in [6] the authors did discuss and propose the 
way in which Kalman filter (KF) could be enhanced in order 
to reflect the knowledge obtained based on the QADA method, 
called QADA-KF method. 


Taking into account that QADA assumes the reward matrix 
is known, regardless of the manner in which it is obtained by 
the user, in this paper we propose and test the performance of 
two possible versions of QADA-KF. The first one utilizes the 
assignment matrix, provided by the Global Nearest Neighbour 
(GNN) method, called QADA-GNN KF approach. The second 
one utilizes the assignment matrix, provided by the Probabilis- 
tic Data Association (PDA) method, called QADA-PDA KF 
method. 


These two QADA-KF methods are compared with the Joint 
Probabilistic Data Association Filter (SPDAF) [7]-[9] which 
is an extension of the Probabilistic Data Association Filter 
(PDAF) [1] to a fixed and known number of targets. JPDAF 
uses joint association events and joint association probabilities 
in order to avoid conflicting measurement-to-track assignments 
by making a soft (probabilistic) assignment of all validated 
measurements to multiple targets. 


The main objective of this paper is to compare the perfor- 
mances of: (i) classical MTT algorithms based on the GNN 
approach for data association, utilizing Kinematic only Data 
(KDA) and Converted Measurement Kalman Filter (CMKF); 
(11) QADA-GNN KF based MTT; (iii) QADA-PDA KF based 
MTT; (iii) JPDAF based MTT. The evaluation is based on 
a Monte Carlo simulation for particular difficult maneuvering 
MTT problems in clutter. 


This paper is organised as follows. In Section II the JPDAF 
is described and discussed. Section III is devoted to QADA 
based KF. Data association methods, providing an assignment 
matrix for QADA are discussed in Section IV. Two particular 
simulation MTT scenarios and results are presented for the 
KDA, QADA-GNN KF, QADA-PDA KF, and JPDAF in 
Section V. Conclusions are made in Section VI. 
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II. JOINT PROBABILISTIC DATA ASSOCIATION FILTER 


The Joint Probabilistic Data Association Filter (JPDAF) 
is an extension of the Probabilistic Data Association Filter 
(PDAF) for tracking multiple targets in clutter [1], [2], [10]- 
[12]. This Bayesian tracking filter uses the probabilistic as- 
signment of all validated measurements belonging to the target 
gate to update its estimate. The preliminary version of JPDAF 
was proposed by Bar-Shalom in 1974 [13], then updated 
and finalized in [7]-[9]. The assumptions of JPDAF are the 
following: 


e the number Nr of established targets in clutter is known; 
e all the information available from the measurements Z* 
up to time k is summarized by the sufficient statistic 
x'(k|k) (the approximate conditional mean), and covari- 
ance P‘(k|k) for each target ¢; 

the real state x'(k) of a target ¢ at time k is modeled by 
a Gaussian pdf N(x‘ (k); x'(k|k), P*(k|k)); 

e each target t follows its own dynamic model; 

e each target generates at most one measurement at each 
observation time and there are no merged measurements; 
each target is detected with some known detection prob- 
ability P‘; 

the false alarms (FA) are uniformly distributed in surveil- 
lance area and their number follows a Poisson pmf with 
FA density Apa. 


In JPDAF, the measurement to target association proba- 
bilities are computed jointly across the targets and only for 
the latest set of measurements. This appealing theoretical 
(O-scan-back) approach however can give rise to very high 
combinatorics complexity if there are several persistent inter- 
ferences, typically when several targets are crossing or if they 
move closely during several consecutive scans. Moreover some 
track coalescence effects may also appear which degrades 
substantially the JPDAF performances as it will shown in 
section V. These limitations of JPDAF have already been 
reported in [14]. Here we briefly recall the basics of JPDAF. 
For more details, please refer to [1], [2], [10]—-[12], [15]. 


A. JPDAF principle 


Let’s consider a cluster! of T > 2 targets t = 1,...,T. 
The set of m, measurements available at scan k is denoted 
Z(k) = {z(k),7 =1,...,m,}. Each measurement z;(k) of 
Z(k) either originates from a target or from a FA. Denote 
z'(k|k — 1) as the predicted measurement for target t, and 
all the possible innovations that could be used in the Kalman 
Filter to update the target state estimate are denoted z!(k) = 
ai(k)—2'(k|k—1), i =1,..., mx. In JPDAF, instead of using 
a particular innovation z/(k), it uses the weighted innovation 
z'(k) = 0"* Bt (k)zi(k), where 8!(k) is the probability that 
the measurement z;(k) originates from target t. 6)(k) is the 
probability that none measurements originate from the target 
t. The core of JPDAF is the computation of the a posteriori 
association probabilities 3!(k), i = 0,1,...,m based on all 


‘A cluster is a group of targets which have some measurements in common 
in their validation gates (i.e. non-empty intersections). 


possible joint association events O(k) = 1)", O4' (k), where 
Oj'(k) is the event that mesurement z;(k) originates from 
target? t;, 0 < t; < Np. More precisely, one has to compute 
fot tS Ay ds opie, ele = Vek) P{O(k)|Z* }ai4(O(k)) 
and Bi(k) = 1— S\y"4 Bt(k), where Z* is the set of all 
measurements available up to time k, and w,,(O(k)) are the 
corresponding components of the association matrix charac- 
terizing the possible joint association O(k). 


B. Feasible joint association events 


Validation gates are used for finding the feasible joint 
events but not in the evaluation of their probabilities [12] (p. 
388-389). To describe the observation situation, it uses the 
validation matrix Q = [w;],7=1,...,m,z andt =0,...,Nr 
with elements w;, € {0,1} to indicate whether or not the 
measurement z; lies in the validation gate of target t. Because 
each measurement can potentially originate from a FA, all 
elements of the first column of 2 corresponding to index t = 0 
(meaning FA, or none of the targets) are equal to one. From 
this validation matrix, all possible feasible joint association 
events 2(O(k)) = [&x(O(k))| where wy(O(k)) = 1 if 
O'(k) € O(k), and zero otherwise, are realized satisfying the 
following feasibility conditions: 

e a measurement can have only one origin, that is for all 7 


Nr 
S| wie(O(k)) = 1. (1) 
t=0 
e at most one measurement can originate from a target 
Mr 
S- @ie(O(k)) <1, fort=1,...,Nr. (2) 
i=1 


The generation of all possible feasible joint association events 
is computationally expensive for complicated MTT scenarios, 
which is a serious limitation of JPDAF for real-world scenar- 
ios. A simple Matlab™ algorithm for the generation of matrices 
((@(k)) is given in [15] (pp. 56-57), which is based on DFS 
(Depth First Search) detailed by Zhou in [16], [17], previously 
coded in FORTRAN in [18]. 


C. Feasible joint association probabilities 


Thanks to Bayes formula, the computation of the a posteri- 
ori joint association probabilities P{O(k)|Z*} involved in the 
derivation of 3! (k) can be expressed as (see [1], [2], [10]-[12], 
[15] for full derivations) 


P{O(k)|Z*} = : - p[Z(k)|O(k), me, ZA} P{O(k) |ma} 
= = SOON r(6(O(R))V HO 
x II [es (ze(R))] OO 
x Il (pey(O) (7 - pr tee) (3) 


By convention and notation convenience, t; = 0 means that the origin of 
measurement z; is a FA. 
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where c is a normalization constant, V is the volume of the 
surveillance region, and the indicators 6,(O(k)) (target detec- 
tion indicator), 7;(O(k)) (measurement association indicator), 
~(O(k)) (FA indicator) are defined by 


64(O(k)) = Sie O(h)) <1 t=1,...,Nr, (4) 
fe 

r(O(b)) # > au(O(R)), ) 

6(0(6) © t= (06) 6) 


tur (¢(O(k))) is the prior pmf of the number of false mea- 
surements (the clutter model) and 


fe,(ai(k)) = Nai (k);2 
where S‘‘(k) is the predicted covariance matrix of innovation 
ai(k) — 2" (k\k — 1). 

Two versions of JPDAF have been proposed [1], [7]-[9]: 


e Parametric JPDAF: Knowing the spatial density Aga 


of the false measurements, and using a Poisson pmf 
Orv)? e7*raV 


“(klk — 1),8%(k)], (7) 


Lr (O(O(k))) = ok)! , results in 
P{O(K)Z} = — -T] Dad - fae] 
i=1 
<T [pete iy = pip eee) (8) 
t=1 


where c, is a normalization constant. 
e Non parametric JPDAF: Using a diffuse prior pmf of 
number of FA ip (¢(k)) = €, Ve(k), results in 


P{O(k)|Z*} = (RD! II (V fe, (a;(k)))" © 
zt [peo aT _ pip (0) (9) 


where cp is a new normalization constant. 


D. JPDAF state estimation 


Once all feasible joint association events O(k) have been 
generated and their a posteriori probabilities P{Q(k)|Z"} 
determined, all the marginal association probabilities 3/(k) = 
Yow PLO(KIZ*}Oie(O(k)) and B§(k) = 1 — IM BE(k) 
are computed. The state update and prediction are done with 
PDAF equations? given by 


x'(k|k) = 


Yat 


3for the decoupled version of JPDAF. For the coupled version of JPDAF, 
see [10], [12]. 


x! (klk), (10) 


with x/(k|k) given by 


Kiso(k|k) = X'(klk—1) + K'(k)z;(K), (11) 
x _o(k|k) = X¢(k|k—-1). (12) 
Using (11) and (12) in (10), then 
K(k|k) = x*(k[k-1) + K*(k Du (13) 
P'(k|k) = Bo(k)P*(k|k-1) + +(1 — Bo(k))Pé(k) + P*(k), 
(14) 
with 
Pi(k) = [[— K'(k)H(k)|P*(k|k — 1), (15) 
P'(k) = K*(k) by 84 (ea (ke) (k) — 2(k)z! (k) |K'(h), 
t=1 (16) 
and 
K'(k) & P*(k|k—1)H'(k)S"(k) (17) 
a; (k) = 2i(k) — 2'(k|k—1), (18) 
i (19) 
i=1 


It has been proved in [1] that P(k) is always a semi-positive 
matrix. The target state prediction x’(k+1|k) and P*(k+1|k) 
are obtained by the classical Kalman Filter (KF) equations 
[1] (assuming linear kinematic models), or by Extended KF 
equations. They will not be repeated here [2], [10]. 

In summary, JPDAF is well theoretically founded and it 
does not require high memory (0-scan-back). It provides pretty 
good results on simple MTT scenarios (with non persisting in- 
terferences) with moderate FA densities. However the number 
of feasible joint association matrices increases exponentially 
with problem dimensions (m, and Ny) which makes the 
JPDAF intractable for complex dense MTT scenarios. 


III. QADA BASED KALMAN FILTER 


The aim of this paper is to compare the performance of the 
JPDAF based MTT algorithm with the classical MTT algo- 
rithm, using the CMKF based on kinematics measurements, 
but improved by the QADA method. 

The main idea behind the QADA method, proposed recently 
by Dezert and Benameur [4] is to compare the values a (i, 7) 
in the first optimal DA solution A; with the corresponding 
values a2(i,j) in second assignment solution Ag, and to 
identify if there is a change of the optimal pairing (7,7). In 
the MTT context (i, 7) means an association between measure- 
ment z; and target T;. QADA establishes a quality indicator 
associated with this pairing, depending on the stability of 
the pairing and also, on its relative impact in the global 
reward. The proposed method works also when the 1°’ and 2”4 
optimal assignments A; and Ag are not unique, i.e., there are 


4Classical MTT algorithms are those based on hard assignment of a chosen 
measurement to a given target. 


197 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


multiplicities available. In such a situation, the establishment 
of quality indicators could help in selecting one particular 
optimal assignment solution among multiple possible choices. 

The construction of the quality indicator is based on belief 
functions (BF) and the Proportional Conflict Redistribution 
fusion rule no.6 (PCR6), defined within Dezert-Smarandache 
Theory (DSmT) [19]. It depends on the type of the pairing 
matching, and it is described in detail in [4]. 

In [6], the authors discuss and propose the way in which 
Kalman filter could be improved in order to reflect the knowl- 
edge obtained based on the QADA method. 

Let’s briefly recall what kind of information is obtained, 
having in hand the quality matrix, derived by QADA, in the 
MTT context. It gives knowledge about the confidence q(?, 7) 
in all pairings (T;,z;),i = 1,..,m;7 =1,..,n, chosen in the 
first best assignment solution. The smaller quality (confidence) 
of hypothesis “z; belongs to T;” means, that the particular 
measurement error covariance R. was increased and the filter 
should not trust fully in the actual (true) measurement z(k+1). 

Having this conclusion in mind, the authors propose, such a 
behaviour of the measurement error covariance to be modelled 
by R = Wem for every pairing, chosen in the first best 
assignment and based on the corresponding quality value 
obtained. Then, when the Kalman filter gain decreases the 
true measurement z;(k+ 1) is trusted less in the updated state 
estimate x(k + 1|k +1). 


IV. BUILDING ASSIGNMENT MATRIX FOR QADA 


Data Association (DA) is a central problem in the modern 
MTT systems [1], [2]. It consists in finding the global optimal 
assignments of targets T;,7 = 1,...,™m to some measurements 
Zj,j =1,..,n at a given time k by maximizing the overall 
gain in such a way, that no more than one target is assigned to a 
measurement, and reciprocally. The mx n reward (gain/payoff) 
matrix Q = [w(i,7)] is defined by its elements w(i,7) > 0, 
representing the gain of the association of target T; with the 
measurement Z;. 

These values are usually homogeneous to the likelihood 
ratios and could be established in different ways, described 
below. They provide the assignment matrix utilized by QADA 
in order to obtain the quality of pairings (interpreted as a 
confidence score) belonging to the optimal data assignment 
solution based on its consistency (stability) with respect to all 
the second best solutions, provided for a chosen algorithm. 

QADA assumes the reward matrix is known, regardless 
of the manner in which it is obtained by the user. In this 
paper we propose two versions of QADA-KF. The first one 
utilizes the assignment matrix built from the single normalized 
distances, provided by the Global Nearest Neighbour method, 
called QADA-GNN KF method. The second one utilizes 
the assignment matrix, built from the posterior association 
probabilities, provided by the Probabilistic Data Association 
(PDA) method, called QADA-PDA KF method. 


A. Assignment matrix based on GNN method 


The GNN method finds and propagates the single most 
likely hypothesis during each scan to update KF. It is a 


hard (i.e., binary) decision approach, as compared to the 
JPDAF which is a soft (i.e., probabilistic) decision approach 
using all validated measurements with their probabilities of 
association. GNN method was applied in [6] and [20] to 
obtain the assignment matrix, utilized in QADA. In this case 
the elements of assignment matrix w(i,j),¢ = 1,..,mj3j = 
1,...,m represent the normalized distances d(i, 7) = [(z;(k) — 
ai(k|k — 1))/S-1(k)(z;(k) — 2i(k|k — 1))|!/? between the 
validated measurement z; and target T; satisfying the con- 
dition d?(i,j) < >. The distance d(i,7) is computed from 
the measurement z;(k) and its prediction 2;(k|k — 1) (see [1] 
for details), and the inverse of the covariance matrix S(k) of 
the innovation, computed by the tracking filter. The threshold 
y, for which the probability of given observation to fall in 
the gate is 0.99, could be defined from the table of the Chi- 
square distribution with M/Z degrees of freedom and allowable 
probability of a valid observation falling outside the gate. 
In this case the DA problem consists in finding the best 
assignment, that minimizes the overall cost. 


B. Assignment matrix based on PDA method 


The Probabilistic Data Association (PDA) method [1] calcu- 
lates the association probabilities for validated measurements 
at a current time moment to the target of interest. PDA 
assumes the following hypotheses according to each validated 
measurement: 


e H,(k): z;(k) is a measurement, originated from the target 
of interest, 2 = 1,...,m 

e Ho(k): no one of the validated measurement originated 
from the target of interest 


If N observations fall within the gate of track 1, N + 1 
hypotheses will be formed. 

The probability of Ho is proportional to pio = AP (1 — 
P,P), and the probability of H; (j = 1,2,..,.N) is propor- 
tional to 


ij 


Daj - pia geen Dey a a 
7 (Qn)M/2./[Sij] 


where P, is the a priori probability that the correct measure- 
ment is in the validation gate [1]; P, is the target detection 
probability; Apa is the spatial density of FA. The probabilities 
pij can be rewritten as [1] 


(20) 


a for 7 = 0 (no valid observ.), 
T2eta1 Cl 
Di = (21) 
ita for 1 < J < N, 
where 
b # (1 — Py Pa)dga(20)™/? 4/|Si5|, (22) 
and 
a2, 
C= Pye F (23) 


In our simulations, we use Py = 0.99 and Py = 0.99. 
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The assignment matrix used in QADA method is established 
from all p;; given by (21) related with all association hypothe- 
ses. This matrix will have m rows (where m is the number of 
all targets of interest), and NV + 1 columns for the hypotheses 
generated. The (V + 1)th column will include the values pjo 
associated with Ho(k). 


V. SIMULATION SCENARIOS AND RESULTS 


The Converted Measurement KF is used in our MTT 
algorithm. We assume constant velocity target model. The 
process noise covariance matrix is: Q = 02Qr, where T is the 
sampling period, o, is the standard deviation of the process 
noise and @r is as given in [3]. Here are the results of KDA 
KF, QADA-GNN KF, QADA-PDA KF, and JPDAF for two 
interesting MTT scenarios. 


A. Groups of targets simulation scenario 


The noise-free groups of targets simulation scenario (Fig.1) 
consists of five air targets moving from North-West to South- 
East. For the clear explanation of the results, targets are 
numbered starting at the beginning with Ist target that has 
the greater y-coordinate and continuing to 5th target with the 
smallest y-coordinate. The three targets 2nd, 3rd, and 4th move 
together between them®. The stationary sensor is located at the 
origin with range 20000 m. The sampling period is Tycan = 5 
sec and the measurement standard deviations are 0.2 deg and 
35 m for azimuth and range respectively. The targets move 
with constant velocity V = 100m/sec. The group of three 
targets in the middle i.e. 2nd to 4th move without maneuvering 
keeping azimuth 135 deg from North. It is the main direction 
of the group’s movement. The first target starts with azimuth 
165 deg and moves towards the middle group of rectilinearly 
moving targets. When it approaches the group, it starts a turn 
to the left with —30 deg. Its initial azimuth of 165 deg is 
decreased by the angle of turn and becomes 135 deg, the 
main direction. The fifth target makes similar maneuver but in 
opposite direction - to the right. Its initial azimuth of 105 deg 
is increased by the turn of 30 deg and becomes 135 deg, and 
also coincides with the main direction. From 21th scan to 48th 
scan all the targets move rectilinearly in parallel. The distance 
between them is 150 m. From 48th scan, the first target makes 
a left turn to azimuth of 105 deg, that means —30 degrees with 
respect to the main direction and starts to go away from the 
middle group. The fifth target makes right turn to azimuth of 
165 deg that means +30 deg from the main direction and also 
starts to go away. All maneuvers are with one and the same 
value of the angle (angle= 30 deg by absolute value), the 
same time duration and linear velocity. The absolute value of 
the corresponding transversal acceleration for all maneuvers is 
1.163m/s?. The total number of scans for the simulations is 
65. Fig. 2 shows the noised scenario for Apa = 16-107!°m~? 
yielding to 0.2 FA per gate on average. 

Our results are based on Monte Carlo (MC) simulations with 
200 independent runs in applying KDA based KF, QADA- 


Note that three targets move together in the center. 
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Figure 1. Noise-free groups of targets Scenario. 
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Figure 2. Noised groups of targets Scenario with Aga = 16 - 10~19m-?. 


GNN KE, QADA-PDA KF, and JPDAF’. We compare the 
performance of these methods with different criteria, and we 
use an idealized track initiation in order to prevent uncon- 
trolled impact of this stage on the statistical parameters of 
the tracking process during MC simulations. The true targets 
positions (known in our simulations) for the first two scans are 
used for track initiation. The evaluation of MTT performance 
is based on the criteria of Track Purity (TP), Track Life (TL), 
and percentage of miscorrelation (pMC): 

1) TP criteria examines the ratio between the number of 
particular performed (jth observation - ith track) associations 
(in case of detected target) over the total number of all possible 
associations during the tracking scenario, but TP cannot be 
used with JPDAF because JPDAF is a soft assignment method. 
Instead of TP, we define the Probabilistic Purity Index (PPI). 
It considers the measurement that has the highest association 
probability computed by the JPDAF and check, (and count) if 
this measurement originated from the target or not. PPI mea- 
sures the ability of JPDAF to commit the highest probability 
to the correct target measurement in the soft assignment of all 
validated measurements. 

2) TL is evaluated as an average number of scans before track’s 
deletion. In our simulations, a track is cancelled and deleted 
from the list of tracked tracks, when during 3 consecutive 
scans it cannot be updated with some measurement because 
there is no validated measurement in the validation gate. When 
using JPDAF, the track is cancelled and deleted from the 
list of tracked tracks, when during 3 consecutive scans its 
own measurement does not fall in its gate. We call this, 


7We have used the non parametric version of JPDAF in our simulations. 
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the “cancelling/deletion condition”. The status of the tracked 
tracks is denoted “alive”. 
3) pMC examines the relative number of incorrect observation- 
to-track associations during the scans. 

The MTT performance results for KDA only KF, QADA- 
GNN KF, QADA-PDA KF, and JPDAF for a low-noise case 
(0.2 FA per gate on average) are given in Table 1. 


JPDAF QADA-GNN | QADA-PDA 
Average TL 50.27 66.46 81.94 90.85 


2.98 


Average pMC_|_335_[ 298 [210 | 175 __ 
Average TP 45.61 PPI=29.14 79.32 87.61 
abie 


GROUPS OF TARGETS SCENARIO: COMPARISON BETWEEN MTT 
PERFORMANCE RESULTS FOR 0.2 FA PER GATE. 


According to all criteria, the QADA-PDA KF method shows 
the best performance, followed by QADA-GNN KF, and 
JPDAF. The KDA based KF approach, as one could expect, 
shows the worst performance. Performance results for a more 


noisy scenario with 0.4 FA per gate on average are given in 
Table 2. 


QADA-GNN | _JPDAF | QADA-PDA 
qd 70.51 70.94 R417 


Average pMC 3.90 3.33 3.11 
Average TP 38.22 66.43 PPI=25.65 78.51 
able 


GROUPS OF TARGETS SCENARIO: COMPARISON BETWEEN MTT 
PERFORMANCE RESULTS FOR 0.4 FA PER GATE. 


As we see, the results for 0.4 FA per gate scenario are 
degraded in comparison to the low-noise case. The average 
miscorrelation for QADA-PDA is slightly higher than for 
JPDAF, probably because QADA method is based on the 
1s* and 2”¢ best solutions only, and more information (i.e. 
the 3"¢ best assignment solution) should be used in such 
case to improve QADA performance, which is left for further 
research. According to TL and TP, still QADA-PDA KF based 
MTT shows stably better performance than JPDAF. 

JPDAF based MTT outperforms QADA-GNN KF and KDA 
KF based MTT approaches according to the considered crite- 
ria. In order to make a fair comparison between QADA KF 
and JPDAF, we will discuss also the root mean square errors 
(RMSE), associated with the filtered X and Y values, presented 
in Figs. 3-7. Figs. 3 and 4 show the mean square X and Y 
error filtered, associated with target 1, and compared for KDA 
KF, QADA-GNN KF, QADA-PDA KF, and JPDAF. Figs. 5 
and 6 consider the same errors for the middle of track 3. All 
the results are compared to the sensor’s errors along X and 
Y axis. We see that the RMSE on X filtered associated with 
KDA KF, QADA-GNN KF, and QADA-PDA KF are a little 
bit above from the sensor’s error in the region where target 
1 makes maneuvers. For scans [20,50], target 1 is moving 
in parallel to the group of other targets running rectilinearly 
and then these errors are less than respective sensors’s ones. 
The RMSE on X filtered associated with JPDAF performance 
is three times bigger in the region between scans 20th and 
30th where target 1 starts moving in parallel to the rest of 


rectilinearly moving targets. The RMSE on Y filtered is high 
during the whole region, where target 1 moves in parallel way. 


Group of Targets Scenario — rmse X—filtered for track 1 


—e— sensor error along X 
‘t80y* —+—rmse X-filtered KDA 
—s—rmse X-filtered QADA—GNN]° 
—+—rmse X-filtered QADA-PDA 
120f ‘ —e—rmse X-filtered JPDAF 


too} FA ‘in gate=0.2 


Figure 3. RMSE on X for track 1 with the four tracking methods. 


Group of Targets Scenario — rmse Y—filtered for track 1 
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Figure 4. RMSE on Y for track | with the four tracking methods. 


The RMSE on Y error filtered by JPDAF are especially crit- 
ical for the middle track 3 which shows its poor performance 
in state estimation on Y direction. The RMSEs are more than 
5 times bigger (in the region between scans 20th and 50th, 
where all five targets move in parallel) than the respective 
errors obtained by KDA KF, QADA-GNN KF, QADA-PDA 
KF, which are less than the sensor’s error. The RMSE on X 
filtered obtained with JPDAF is under the sensor’s error, beside 
KDA KF, QADA-GNN KF, QADA-PDA KF methods. 


Group of Targets Scenario — rmse X—filtered for track 3 


—e— sensor error along X 
—s—rmse X-filtered KDA : 
—s—rmse X-filtered QADA—GNN 
—=—rmse X-filtered QADA-PDA 

e- rmse X-—filtered JPDAF 
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Figure 5. RMSE on X for track 3 with the four tracking methods. 


The large value of RMSE on Y using the JPDAF can be 
explained by the specificity of the scenario because it has 
five targets moving closely during more than 30 consecutive 
scans with sensor’s measurement errors, and false alarms 
density, which yields to spatial persisting interferences and 
track coalescence effects in JPDAF, as shown in Fig. 7, where 
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Groups of Targets Scenario — rmse Y—filtered for track 3 
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Figure 6. RMSE on Y for track 3 with the four tracking methods. 


the red and green plots are the tracks estimates. These effects 
degrades significantly the quality of JPDAF performance as 
already reported in [14]. 


‘Track Coalescence effect in JPDAF 
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Figure 7. JPDAF track coalescence (one run) with Apa = 16 - 10-19 m-?. 


B. Crossing targets simulation scenario 


The second considered (crossing targets) scenario (Fig. 8) 
consists of two maneuvering targets moving with constant 
velocity 38m/sec. At the beginning, both targets move from 
West to East. The stationary sensor is located at the origin 
with range 1200 m. The sampling period is T’scan = 1sec 
and the measurement standard deviations are 0.2 deg and 25 
m for azimuth and range respectively. 

The first target, having at the beginning greater y-coordinate, 
moves straightforward from West to East. Between the 8th and 
12th scans it makes a 50 deg right turn, and then it moves 
straightforward during 8 scans. From the 20th scan to the 
24th scan it makes a 50 deg left turn, and then it moves in 
East direction till the 41th scan. It makes a second 50 deg 
left turn between 41th and 45th scans, and then it moves 
straightforward during 8 scans. From 53th scan it makes a 
second 50 deg right turn till the 57th scan and then it moves 
in East direction. The trajectory of target 1 corresponds to the 
red plot of Fig. 8. 

The second target makes a mirrored trajectory correspond- 
ing to the green plot of Fig. 8. From scan | to 8 it moves from 
West to East. During 8th to 12th scans it makes a 50 deg left 
turn. Then it moves straightforward during 8 scans. During 
20th to 24th scans it makes a 50 deg right turn and then it 
moves in East direction till the 41th scan. It makes a second 
50 deg right turn between the 41th and 45th scans, and then 


it moves straightforward during 8 scans. From the 53th scan 
it makes a second 50 deg left turn till the 57th scan and then 
it moves in East direction. The total number of scans for the 
simulations is 65. Fig. 9 shows the respective noised scenario 
for Apa = 4-107%m7?. 
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Figure 8. Noise-free Crossing targets Scenario. 
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Figure 9. Noised Crossing targets Scenario with Aga = 4-10~7m~?. 


The MTT performance results obtained on the base of KDA 
only KF, QADA-GNN KF, QADA-PDA KF, and JPDAF for 
less noised case corresponding to 0.2 FA per gate are given in 
Table 3, and the performance results for a more noisy scenario 
with 0.4 FA per gate on average are given in Table 4. 


QADA-GNN JPDAF QADA-PDA 
Average TL 77.06 88.93 91.25 93.47 
Average pMC 
Average TP 72.78 85.64 PPI=86.2 87.96 


able 
CROSSING TARGETS SCENARIO: COMPARISON BETWEEN MTT 
PERFORMANCE RESULTS FOR FA IN GATE = 0.2. 


QADA-GNN JPDAF QADA-PDA 

55.80 SzaT S18 
Average pMC 340 
Average TP 52.90 72.01 PPI=76.94 T1AS 


able TV 
CROSSING TARGETS SCENARIO: COMPARISON BETWEEN MTT 
PERFORMANCE RESULTS FOR FA IN GATE = 0.4. 


According to all criteria, the QADA-PDA KF shows again 
the best performance, but now JPDAF based MTT shows 
closed to QADA-PDA KF performance in comparison to the 
previous scenario, and exceeds the performance of QADA- 
GNN KF. Nevertheless the performances of all methods are 
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deteriorated in more noised case, when one has 0.4 FA in gate 
on average, this tendency is still kept. JPDAF has better (than 
in the previous scenario) performance, but still QADA-PDA 
KF exceeds its performance. 


Figures 10-13 show that the RMS errors associated with 
X and Y filtered are below the sensor’s error. They confirm 
the better performance of JPDAF in this particular scenario 
with only two maneuvering targets, which is simpler than the 
groups of targets scenario. 


Crossing Tar; 
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Figure 10. RMSE on X for track 1 with the four tracking methods. 
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Figure 12. RMSE on X for track 2 with the four tracking methods. 


VI. CONCLUSIONS 


This work evaluated with Monte Carlo simulations the 
efficiency of MTT performance in cluttered environment of 
four methods (i) classical MTT algorithm based on the GNN 
approach for data association, utilizing Kinematic only Data 
based Kalman Filter; (ii) QADA-GNN KF; (iii) QADA-PDA 


Crossing Targets Scenario — rmse Y—filtered for track 1 
say —s— sensor error along X 
—s— rmse X-filtered KDA 


—s— rmse X-filtered QADA-—GNN 
—s—rmse X-filtered QADA—PDA 
—s— rmse X—filtered JPDAF 


Figure 13. RMSE on Y for track 2 with the four tracking methods. 


KF; and (iiii) JPDAF. The first scenario (groups of targets) 
shows the advantages of applying QADA-KF. According to 
all performance criteria, the QADA-PDA KF gives the best 
performance, followed by QADA-GNN KF, and JPDAF. The 
KDA KF approach shows the worst performance (as expected). 
This scenario is particularly difficult for JPDAF because of 
several closely spaced and rectilinearly moving targets in 
clutter during many consecutive scans, and it leads to track 
coalescence effects due to persisting interferences. As a result, 
the tracking performance of JPDAF is degraded. Because the 
complexity of the calculation for joint association probabilities 
grows exponentially with the number of targets, JPDAF re- 
quires almost 3 times more computational time in comparison 
to other methods in the first (complex) scenario. In the second 
(only two crossing targets) MTT scenario, JPDAF shows better 
tracking performances in comparison to QADA-GNN KF. It is 
able to track more precisely these only two targets, because of 
non persisting interferences. Overall, our analysis shows that 
QADA-PDA KF method is the best of the four approaches to 
track multiple targets in clutter with a tractable complexity. 
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Abstract—In 2016 we developed a new approach for Multi- 
Criteria Decision-Making (MCDM) inspired by the technique 
for order preference by similarity to ideal solution (TOPSIS) and 
based on belief functions (BF). Our BF-TOPSIS (Belief Function 
based TOPSIS) approach assumes that the input score of each 
hypothesis for each criterion was a real precise number which 
is a quite restrictive assumption. In this paper we extend our 
BF-TOPSIS to deal with imprecise score values (intervals of real 
numbers) and we call it Imp-BF-TOPSIS. This new approach 
follows main ideas of BF-TOPSIS but extends its applicability for 
more realistic MCDM problems where the scores are given with a 
finite precision. Imp-BF-TOPSIS is based on Interval Arithmetic 
(IA), new probabilistic order relations between intervals and 
belief functions. We also present results of Imp-BF-TOPSIS for 
simple examples for illustrating its effectiveness. 

Keywords: Information fusion, multi-criteria decision- 


making, MCDM, belief functions, TOPSIS. 


I. INTRODUCTION 


The Multi-Criteria Decision-Making (MCDM) aims to 
choose an alternative among a known set of alternatives 
based on their quantitative or qualitative evaluations (scores) 
obtained with respect to different criterias MCDM can be 
considered as a decision-level information fusion, and it has 
been widely used in many decision-making applications. In 
classical MCDM problem, all the criteria and all alternatives 
are known, and the score values are usually real numbers 
(precisely known). Depending on the context of the MCDM 
problem, the score can be interpreted either as a cost/expense 
or as a reward/benefit. In the sequel, by convention and without 
loss of generality we will interpret the score as a reward 
having monotonically increasing preference. Thus, the best 
alternative with respect to a given criteria will be the one 
providing the highest reward/benefit.The set of score values 
is represented by a quantitative benefit or payoff matrix. Each 
criterion can also have a relative importance weight. Many 
methods have been proposed in the literature to solve the 
classical MCDM [1]. When the score values are incomplete or 
imprecise (quantitative or qualitative), traditional approaches 
for classical MCDM problems do not work. In this paper, 
we focus on these unclassical MCDM problems. We propose 
to extend the BF-TOPSIS approach to deal with imprecise 
score values to cover a broader spectrum of real MCDM 
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applications. We use the theory of belief functions and the 
interval arithmetic. This extension of BF-TOPSIS method is 
referred as Imp-BF-TOPSIS method in the sequel, where Jmp 
is an abbreviation standing for Imprecise to specify that the 
BF-TOPSIS will work with imprecise score values (or more 
generally with imprecise basic belief assignments (BBAs)). 
The rest of this paper is organized as follows. In section II, 
the formulation of classical MCDM problem is provided. In 
Section III, we introduce Interval Arithmetic and propose new 
(probabilistic) order relations for intervals as well as distances 
between intervals. In section IV, basics of belief functions are 
recalled. In section V we recall the principle of BF-TOPSIS 
for classical (precise scores) MCDM. The Imp-BF-TOPSIS for 
imprecise score values is presented in section VI, with simple 
examples in section VII. SectionVIII concludes this paper. 


II. FORMULATION OF CLASSICAL MCDM 


A classical MCDM problem has a given set of alternatives 
A & {Aj,Ao,...,Aum} (M > 2), and a given set of 
criteria C = {C,C2,...,Cw} (N > 1). Each alternative 
A; represents a possible choice (a possible decision to make). 
In a general context, each criterion is also characterized by a 
relative importance weighting factor w; € [0,1], j =1,...,N 
which are normalized by imposing the condition > jy = 
1. The set of normalized weighting factors is denoted by 
w = {w1, w2,..., wn}. The score of each alternative A; with 
respect to each criteria C’; is expressed by a real number Sj; 
called the score value of A; based on C;. We denote S the 
score M x N matrix which is defined as S = [S;;]. The 
MCDM problem aims to select the best alternative A* € A 
given S and the weighting factors w of criteria. 


III. CALCULUS WITH INTERVALS 


A closed interval x is denoted by x = [x,Z@] = {a|x < 
x < &}. x = inf(x) is the infimum (lower endpoint) of x 
and % = sup(x) is the supremum (upper endpoint) of x taken 
values in R. The set of intervals over R is denoted by IR. 
An interval in which one endpoint is included and the other is 
excluded is called a half-closed interval (or half-open interval) 
and it is called an open interval if its endpoints are excluded. 
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Any precise number x can be expressed with imprecise num- 
ber notation as the degenerate interval x = [a,x]. A non- 
degenerate interval is called a proper interval. The numbers 
wid(x) = @—z, rad(x) = wid(x)/2 and mid(x) = $(a+2) 
are respectively the width, the radius and the midpoint of 
x. If x is a precise number (i.e. a degenerate interval), 
then wid(x) = 0 and x = [mid(x), mid(x)]. The number 
mag(x) = max{|x| | « € x} = max{|z|,|z|} is the magni- 
tude of x, and mig(x) = min{|z| | ~ © x} = min{|z], |z|} 
is the mignitude of x. If x and y are overlapped intervals 
then x My and x Uy are also intervals defined by x Ny = 
[max{z,y}, min{Z%, y}] and xUy = [min{z, y}, max{Z, 7}}]. 
If x and y do not overlap, then x Ny is empty, and x Uy is 
not a proper interval but the union of two disjoint! intervals. 
In this case, the interval [min{z, y}, max{Z, 7}] is the tightest 
interval that includes x Uy and it is called the interval hull of 
x and y. The interval x is a subset of y if (y < a) A(% <9). 
The interval x is equal to y if (a = y) A(% = 9). 


A. Interval Arithmetic 


Interval Arithmetic (IA) is an arithmetic defined on intervals 
of IIR. Its modern development started with Moore’s works 
[4]-[7] and yielded recently to an IEEE Standard [8]. The 
INTLAB Matlab™ toolbox for IA has been developed and 
proposed by Rump in [9] with a tutorial in [10]. Other tools 
implementing IA are listed in [7] with more resources available 
on Kreinovich’s interval computation web site [11]. The basic 
operations? on intervals are: 

- Addition: x + y = [x +y,7+4+ 9] 

* Subtraction: x — y = [x — y,Z — y]. In particular, —x = 
[—z, —a], because —x = [0,0] — [z, z]. 

¢ Multiplication: x x y = [min{S, (x, y)}, max{S,, (x, y)}], 
where S,.(x,y) = {xy, xy, Zy, ZY} is the set of all possible 
products? of endpoints of x and y. In particular, —x = 
[—Z, —x] because —x = [—1, —1] x [x, Z|] = [z, Z] x [-1, -1]. 
¢ Division: x/y = [min{S_(x, y)}, max{S(x,y)}], if 0 ¢ 
y and where S(x,y) = {z/y,z/y,Z/y, Z/V} is the set of all 
possible divisions of endpoints of x and y. If 0 € y then the 
division by y can be handled with more effort using extended 
interval arithmetic [7], [12] not detailed in this paper. 

- Inverse: if x > 0 or Z <0, 4 = [1/Z, 1/z]. 

The following algebraic properties hold for all x, y,z € IR: 
+ Associativity: (x-+y)+z = x+(y+z) and (xy)z = x(yz). 
¢ Commutativity: (x + y) = (y + x) and (xy) = (yx). 

- Neutral elements: 0 + x = x + 0 = x where 0 = [0,0], 
0-x=x-0=Oand1-x=x-1=~x where 14 [1,1]. 

Proper intervals do not have additive or multiplicative 
inverses and the distributivity law does not hold for intervals. 
Instead, the following sub-distributivity low (weaker version 
of distributivity) holds : Vx, y,z € IR, x(y +z) C xy + xz. 

Although the interval arithmetic is appealing and looks 
simple for basic operations with intervals, the so-called de- 


pendency problem is a major obstacle to its application when 
'x = [x, Z] and y = [y, g] are disjoint if ( < y) V (9 < 2). 

?For simplicity, we use operations on closed intervals. 

3The product of x and y will also be denoted x - y, or xy for simplicity. 


complicate expressions have to be calculated to find tightest 
range enclosure. In fact, we must take care of dependencies 
of variables involved in formulas before applying IA in order 
to get tightest results. To reduce the dependency effect in the 
result, we need to replace (if possible) the original expression 
to compute by an equivalent simpler one having less (or none) 
redundant variables. For example, the derivation of x/[x + y] 
for 0 ¢ x must be computed with IA by 1/{1 + y/x] to get 
tightest result. Also, the power 2 of x must not be computed 
by [z,z] x [z,Z] because the unknown precise value of x 
in [z,Z] must be exactly the same (strong dependency) in 
the multiplication operation in the derivation of x”. Hence, 
x = {e4|—-2 < 2 < 2} =. [0,4] is different of 
[—2, 2] x [—2, 2] = [—4, 4]. 


B. Basic interval functions 


Here several functions that are used in the sequel. More 
interval functions can be found in [7], [13]. 
¢ Absolute value [13]: 


[7], al], if e <0 
Ixl={[lzl lal], if 220 (1) 
(0, max{|z],|Z|}], if 2<Oandz%>0 
¢ Power [7]: 


e If n > 0 is an odd number: x” = [2”, =”) 
e If mn > 0 is an even number 

ifx>0 

x4 [ee ),  afa <0 


(0, max{a”, ©" |}], 


[z", 2"), 


if 0 Ex. 


e Ifz>0O and az > 0, x* = [x*, Zz]. 
¢ Square root [7]: 


a _ JL Vz, Vz], for z > 0 
ana ee if0ex. 


C. Order relations for intervals 


The real numbers are ordered by the relation < (or >) and 
comparing two real numbers is in general not a difficult task. 
In the methods developed in this paper, we need to compare 
imprecise numbers represented by intervals. We interpret an 
interval to mean “there is a point that lies between the bounds” 
and the relation between two intervals is a relation between 
the two points belonging to intervals (i.e. a possibly relation). 
Comparing intervals is not obvious in the general case when 
the intervals have a non-empty intersection. For this, we 
propose a method for comparing intervals and we then explain 
how to find the min (or max) element of a set of intervals. To 
make comparisons, we assume that the unknown precise value 
belonging to an imprecise number is uniformly distributed in 
the interval under concern. This assumption is motived by 
the principle of insufficient reason. The comparative test that 
we propose does not provide a true or false answer (boolean 
function), but only a probability value that the test is satisfied 
or not. To implement the comparison between to intervals x 
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and y of IR, we define W + wid(x)wid(y) for notation 
convenience and we need to distinguish all possible situations 
as follows: 


e Case lia < & < y < y. In this case, x < y with 
probability P(x < y) = 1. 

e Case 2: y < y < x < @. In this case, x < y with 
probability P(x < y) =0. 

e Case 3:4 <y<Z< y. In this case, x < y with 


P(x<y)= srlwid(a )wid(b) + wid(a)wid(c) 
(wid(b)? /2) + wid(b)wid(c)] (2) 


A = ATs o 
= [y, 7] and ¢ = [z, 9}. 
< 2%. In this case, x < y with 


+ (wid(b)*/2)] @) 


where a = [x,y] and b = [y, jj 
e Case 5: y <x <y <4. In this case, x < y with 
ll 
P(x < y) = 5; (wid(b)’/2) (4) 


where b = [z, 9]. 
e Case 6: y<x2<2<-y. In this case, x < y with 
(wid(b)?/2)] (5) 


Pass Gp lwid(d)wid(e) m 


where b = [x, z] and c & [z, yj]. 

Formulae (2)-(5) are obtained by the probability calculus using 
uniform distributions over intervals and the total probability 
theorem. For case 3, one has P(x < y) = P(x < y,x € 
ayeb)+P(x<y,reayeot+P(x<y,rebye 
b)+ P(x <y,x€b,y€c) with Pix <y,reayeb)= 

wid(a) wid(b) _ wid(a) wid(c) 

bs wid(x) wid(y)’ P(x < we eaye c) = ls wid(x) wid(y)’ 


P(ix<y,cebyeb) =F: aa ae and P(x <y,ré 
b,yec)=1- 


ae eae which gives formula (2). 

The value of P(x > y) can be computed by a similar 
approach. Of course, P(x > y) = 1— P(x < y) and P(x < 
y) =1-— P(x > y). Also, one has P(x 4 y) = P(x < 


y)+ P(x >y)=1-P(x=y). 
[—3, 0], y= 


Example 1: x = (—1,4], then P(x < y) = 
0.9667. 

Because we know how to compute the probability P(x < y) 
for any two imprecise numbers x and y, we are able to find 
the min (or max) elements of a set of imprecise numbers 
X = {xj,X2,...,X:} with a given associated probability. 
For instance, for finding the min element of X we proceed as 
follows: 


e Calculate the M x M square matrix*: 
ps4 [Ps = P(x; < x;)] (6) 


4By construction all diagonal elements P;; equal zero. 


e Calculate the likelihood \; = A(x;) of x; to be the min 
of X as the sum of P;; for 7 ¢ i, that is 
=) Fh 


Ds 
J#t 


j=1,...,.M|jAi 


e The index of the most likely min element of X is 


i= P(x; < x;) 


imin = arg. imax ri (8) 
The most likely min element of X is given by x;,,,,,, with 
the probability P(x,,,, = min{X}) = \,;/(M — 1). 

An approach similar is applied to find the max element 
of X using the likelihood 4; = )7j4;P(xi > x,;) and 
imax = arg max; \;. The max element of X will be given by 
Xia. With the associated probability P(x,,,,. = max{X}) = 
di/(M — 1). Moreover and if needed, we can also sort 
(probabilistically) all the elements of X by decreasing (or 
increasing) order based on the likelihood values );. 
Example 2: Let’s consider the set of intervals X = {x, = 
[—2,2],x2 = [-3,0],x3 = [0,5],x, = [-1,3],}. From 
formulas of P(x < y) given for aforementioned cases 1-6, 
one obtains 

0 0.1667 0.9000 0.7188 
0.8333 0 1.0000 0.9583 
0.1000 0 0 0.2250 
0.2812 0.0417 0.7750 0 


with the likelihood values 


P=(Poo <x) = 


Ay = (x1) 1.7854 
Az = X(X2)| _ | 2.7917 
A3 = (x3) 0.3250 
Aa = A(X4) 1.0979 
The maximum likelihood is Ag = 2.7917 and the corre- 


sponding index is imin = 2. This means that x2 = [—3,0] 
is most likely the min element of X with the probability 
P(x. = min{X}) = 2.7917/3 = 0.9306. Using a similar 
approach, one will find that the max of X is x3 = [0,5] with 
the probability P(x; = max{X}) = 2.6750/3 = 0.8917. 
Based on the likelihood values of the min element of X sorted 
in decreasing order, we obtain x2 < x1 < x4 < x3, which 
corresponds to what we intuitively expect in such example. 


D. Distances between intervals 


There are many ways to define strict distance metrics 
between two intervals. The simplest one is the Hausdorff 
distance between two intervals x and y of IR which is the 
maximum distance d(x,y) of  € x to its nearest point 
y € y, where d(x, y) is any chosen metric [14], more precisely 
da(X,y) = Maxzex{minycy d(x, y)}. For simplicity, if we 
choose the L, distance metric dz, (x,y) = , then 
Hausdorff’s distance is given by 


du(x,y) = max{le— yl,|@— gl} 
= |mid(x) — mid(y)| + |rad(x) — rad(y)| ©) 


Another interesting distance successfully used for decision- 
making under uncertainty in the belief functions framework 
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[1], [15], [16], is Wassertein’s distance metric [17], [18] 
dw (x,y) defined as 


A 


dw (x,y) = \/ [mid(x) — mid(y)|? + slrad(x) — rad(y)]? 
(10) 

which corresponds to Mallows’ distance [19] between two 

probability distributions when we assume that each interval 

is the support of a uniform distribution. 

Example 3: If x = [—3, 0] and y = [—1, 4], then dy(x,y) = 

4 whereas dy (x, y) © 3.0551. 


IV. BASICS OF BELIEF FUNCTIONS 


Belief functions have been introduced by Shafer in [20] to 
model epistemic uncertainty. We assume that the answer? of 
the problem under concern belongs to a known (or given) finite 
discrete frame of discernement (FoD) O = {61,62,...,8n}, 
with n > 1, and where all elements of © are exclusive®. 
The set of all subsets of © (including empty set § and ©) is 
the power-set of © denoted by 2°. A basic belief assignment 
(BBA) associated with a given source of evidence is defined 
[20] as the mapping m/(-) : 2° — [0,1] satisfying m(0) = 0 
and })4c9e m(A) = 1. The quantity m(A) is called the 
mass of A committed by the source of evidence. Belief and 
plausibility functions are respectively defined by 


Bel(A)= 5° m(B), and Pl(A)=1-—Bel(A). (11) 


BCA 
Be2® 

If m(A) > 0, A is called a focal element of m(-). When all 
focal elements are singletons then m/(-) is called a Bayesian 
BBA [20] and its corresponding Bel(-) function is homoge- 
neous to a (subjective) probability measure. The vacuous BBA, 
or VBBA for short, representing a totally ignorant source is 
defined as’ m,(9) = 1. 

Shafer [20] proposed to combine s > 2 distinct sources of 
evidence represented by BBAs m(.),...,775(.) over the same 
FoD with Dempster’s rule (i.e. the normalized conjunctive 
rule). The justification and behavior of Dempster’s rule have 
been disputed over the years from many counter-examples in- 
volving high or low conflicting sources (from both theoretical 
and practical standpoints) as reported in [22]—[25]. Many rules 
of combination exist®, and we recommend the new interesting 
rules based on the proportional conflict redistribution (PCR) 
principle, see [21], Vol. 3 for details. 

A true distance metric between two BBAs m (.) and ma(.) 
defined on the same FoD, has been defined in [15] as follows? 


dgr(mi,m2) = /Ne- >) diy(Bh(X),Bh(X)) (12) 
XE29° 


5i.e. the solution, or the decision to take. 

This is so-called Shafer’s model of FoD [21]. 

7The complete ignorance is denoted © in Shafer’s book [20]. 

8see [21], Vol. 2 for a detailed list of fusion rules. 

° Another well-known real distance metric dz(m1, mz) had been proposed 
before by Jousselme et al. in [26] which could also be used but we prefer to 
work with dgr(mi, mz) distance for reasons explained in [27]. 


where the Belief-Intervals are defined by BI,(X) +4 
[Bel,(X), Pl,(X)] and Blp(X) + [Bel(X), Pl2(X)], and 
where dw(Bh(X), Blz(X)) is Wassertein’s distance be- 
tween intervals calculated by (10). Ne. = ior is a 
normalization factor to get dg7;(m1,mz2) € (0, 1]. 

Making decision on an element of FoD from a given BF 
(Bel(.), Pl(.), or m(.)) can be done in many manners. For 
instance, 

e in taking the argument of max of { Bel(6;),4 = 1,... 
This is a pessimistic decisional attitude. 
in taking the argument of max of { P1(6;),4 = 1,... 
This is an optimistic decisional attitude. 

e in approximating the BBA m/(.) by a subjective proba- 
bility measure P(.) and taking the argument of max of 
{P(6;),i = 1,...,n}. This is a compromise decisional 
attitude. 

in taking the argument of min of {dg;(m(.),mo,),2 = 
1,...,n}, where mg, is the BBA entirely focused on 0; 
defined by mo,(X) = 1, if X = 0; and mo,(X) = 0, if 
X FOG. 

In the sequel, we will use the latter method which has been 
proved very effective in [16], [27]. 


nt}. 
Nn}. 


V. BF-TOPSIS WITH PRECISE SCORES 


Four BF-TOPSIS methods have been proposed in [1] with 
an increasing complexity and robustness to rank reversal 
phenomenon for MDCM support. In this section we briefly 
recall the main ideas of BF-TOPSIS. For further mathematical 
details, please refer to [1]. All these methods start with 
constructing BBAs from the precise score values of the score 
matrix S as briefly explained. Only the way those BBAs are 
processed differs from one BF-TOPSIS method to another one. 


A. From precise scores to precise BBAs 


In [1], one has proved that BBAs can be consistently built 
from the precise score matrix S as follows: 


Sup ; (Aj) if Aj 0 

Bel;; (A;) 4 Ainax ; ne 7 (13) 
0 if Ajnax = 0 
Inf (Ai) if Al : 0 

Belyj(A) 44 4h, | z (14) 
0 if Pints =0 


where A; is the complement of A; in the FoD 0 & 


{Aj, Ao, ae ,Au } (M > 2), and 
Sup ;(Ai) 4 [Siz — Sr3| (15) 
hE{1,.-.,M}/Sp5 <5; 
Inf (Ai) = - > [Siz — Seal (16) 


kE{1,...,M}|S,5 > 33; 


The denominators involved in Eqs. (13)-(14), are defined by 
Ajax “= max;Sup,;(A;) and Aj, = minjnf,(Ai), and 


max min 


they are supposed different from zero!°. Therefore, the belief 


1OTF Ad ax = 0 then Bel;;(A;) = 0, and if Al 


a 0 then Pl; (A;) = 1, 
so that Bel;;(A;) = 0. 
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interval of choosing hypothesis A; considering criterion C; is 
given by: 


Ae Inf (As), 
Al 


min 


Sup, (Ai) 


Axias 


[Bel;; (Az); Plij(Aa)] = [ (17) 


From this belief interval, we deduce the BBA mjj(-) which 
is the triplet (mi; (Aj), Miz (A), Miz (A; U Aj)) defined by: 


Miz (A;) 4 Bel;; (A:) (18) 
mi; (Ai) a Bel;; (Aj) =] =— Plj; (Ai) (19) 
Miz (A; U A;) 4 Miz (QO) = Pl; (Ai) = Bel;;(Ai) (20) 


If a numerical value 5;; is missing in S, one uses mi; (-) 
(0,0, 1), ie. one takes the vacuous belief assignment. 
Using the formulae (13)-(20), we obtain from any M x N 
precise score matrix'! S the general M x N matrix M + 
[mi;(.)] of BBAs that are involved in BF-TOPSIS methods. 
This construction of BBAs is very interesting for applications 
because it is invariant to the bias and scaling effects of score 
values [1]. Also, it allows us to model our lack of evidence (if 
any) with respect to an (or several) alternative(s) when their 
corresponding score values are missing for any reason. 


B. BF-TOPSIS1 method 


From the BBA matrix M and for each alternative A;, one 
computes distances dar(miz, mee) between m,;(-) and the 
ideal best BBA defined by mbsst( A; ) = 1, and the distances 
dar(miz,mpr"™) between mas (>) and the ideal worst BBA 
defined by miyport(A;) £1. Then, one computes the weighted 
average distances with relative importance weighting factor w; 


of criteria C’; as follows: 


dPest( A 


2 yw dpr(miz,m;s") (21) 


dvs'( st 


2 Dw dpr(miz,mpo"') (22) 


The relative closeness of the alternative A; with respect to the 
ideal best solution A*' defined by 


qwvorst (A;) 


_ Abest) A 
C(Ai, A ) dworst( A;) + dbest( A; ) 


(23) 
is used to make the preference ordering according to 
the descending order of C(A;, A) ¢€ [0,1], where a 
larger C(A;, A*') value means a better alternative (higher 
preference). 


'lNote that each element m4,;(.) is in fact a 3-uple of masses given by 
(18)-(20). 


C. BF-TOPSIS2 method 


For each criteria C;, one computes at first the relative 
closeness of each alternative A; w.r.t. its ideal best solution 
Abest by 


dpr (mij ; mort) 


C; A AP A 
il ) dar (mij, MP) + dar(miz,m 


mbpsst) (24) 
The global relative closeness C(A;, A*') of each alternative 
A; with respect to its ideal best solution A®t used to make 
the final preference ordering is then obtained by the weighted 
average of C;(A;, A“), that is 

CA, pest ey W;° 


C; (Ai, A (25) 


D. BF-TOPSIS3 method 


For each alternative A;, one fuses the N precise BBAs 
my;(-) discounted with importance factor w; (see [28]) with 
PCR6 rule of combination [21] (Vol. 3) to get the precise 
fused BBA m?C*®, from which one computes the distance 
ge A = der(m PCR6 mest) between mPCR®(.) and its 
ideal best BBA m?°**(A;) £ 1. Similarly, one computes 
the distance d'”°"S'(A;) = dar(m PORG worst) between 
mPCRS(.) and mverst(A;) = 1. The relative closeness of 
each A; with respect to ideal best solution C(A;, A°°*") is 
computed by (23), and is used to make the preference ordering 
according to the descending order of C(A;, A°°**). 


E. BF-TOPSIS4 method 


This method is similar to BF-TOPSIS3 except that we use 
the more complicate ZPCR6 fusion rule taking into account 
Zhang’s degree of intersection of focal elements in the con- 
junctive consensus operator, see [29] for details. 


VI. BF-TOPSIS WITH IMPRECISE SCORES 


In this section we present the extension of BF-TOPSIS 
methods to deal with imprecise score values Si; = [9;,, Sis], 
a7 = 1,...,M and 7 = 1,...,N. These extensions will be 
referred as Imp-BF-TOPSIS in the sequel. The basic idea is 
to follow principles of BF-TOPSIS using Interval Arithmetic 


(IA) instead of classical arithmetic on reals. 


A. From imprecise scores to imprecise BBAs 


The application of formulae (13)-(20) using IA operations 
does not work directly because of potential division by inter- 
vals including zero, and because of comparison tests involving 
boolean < and > functions. To circumvent these problems, we 
need to avoid intervals including zero, and replace boolean < 
and > functions by their probabilistic counterpart presented 
in section II-C. This is done as follows: 


e Step 1 (Offset correction): To work only with positive 
intervals, we apply at first an offset correction of impre- 
cise score values S;; = [S;,, Si;] for each column j of 
imprecise score matrix S = [5;,;]. This is a preprocessing 
step. We are allowed to do this because, by construction, 
the BBAs based on formulae (13)—(20) are invariant to 
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bias and scaling effects [1]. Therefore, we can always 
replace the original imprecise score [S,,,5;;] by 


[Si5. Sig] = Sig, Sig] + 


Dijs Dijs (6, + €, by + €| (26) 


where € > 0 is an arbitrary positive number to ensure 
the strict positivity of intervals, and the offset correction 


value 5; is given for 7 = 1,...,N by 
3; — min 1555} (27) 

Example 4: Let’s consider the FoD 0O 
{A;, A2, A3, As} with four alternatives, a criterion 
C, and the following associated imprecise scores 
Si, = [-2,2], Sor = [-3,0], S31 = [0,5] 
and $4; = [1,3]. The offset correction is then 
o1 = —min{—2, —3,0,—1} = 3. If we take, « = 1 then 
we will get the corrected (positive) imprecise scores 

Shy a [—2, 2] Tv [3 a 1,3 “Kr, 1] = [2, 6] 

So = [-3,0] + [3+1,3+1] = [1,4] 

S3, = (0,5) + [3+1,3+1] = [4,9] 

Su, = [-1,3] + [3+1,3+ 1] = [3,7] 


e Step 2: Replace the S;,; < Sj; and S;; > Sj; tests in- 
volved in (15) and (16), by their probabilistic counterparts 
P(Spj < Siz) > 0.5 and P(S;,; > Si;) > 0.5 because 
Sj and S;; are imprecise numbers (i.e. intervals), where 
P(Spj < Siz) and P(S;; > S;;) are computed as in 
section ITI-C. 

In the sequel, we assume that the offset correction of score 
as been applied (step | done) and for notation simplicity we 
denote these (corrected) strictly positive imprecise scores S;;. 
The imprecise BBAs can now be computed from the (offset- 
corrected) imprecise scores values as follows!” 


Sup; (Aj) ‘ ; 
Da A a": if Aliax # [0, 0] 
Bel,,(Az), Belij(Aa)| 4 2 Akan ; 28 
[Bel; (Ai), Belij(Ai)] ti if A, = (0,0) (28) 
Inf ;(Ai) 
oo ey ais if A? 
[Bel,,(A:), Beliy(Ag] £4 ~Aig, Amin #1 ogy 
(0, 0] if Abin = [0 0] 
where 
Sup;(Ai) = [Sup (Ai), Sup; (As)] 
“ [Siz — Sig GO) 
kE{1,...,M}|P(Skj<Sij)>0.5 
Infj(Ai) = Unf (Ai), Inf ;(Ai)] 
=- > ISij — Sejl GBD) 


kE{1,...,M}|P(Sp5>Sij) 20.5 


The denominators involved in Eqs. (28)-(29), are defined 
by A? Ad ax, Al 4 max;Sup;(A;) and A} 


max —_ [Aree aul min 


'2Remember that operations involved in the formulas of this section are IA 
operations defined in section III. 


[A’,,42.,] * min,lnf ;(A;), and they are supposed dif- 


—min? min 


ferent from (0, 0}!°. Therefore, in non-degenerate case (when 
Ada, #& (0,0) and A’ 4 [0,0}) the belief interval of 
hypothesis A; considering criterion C; has now imprecise 


bounds given by 


B _ ST = Sup ;(Ai) 
eli;(Ai) = [Bel,; (Ai), Belij(Ai)] = aE (32) 
Plij(Ai) = [PL (Ai), Plig(As)] = [1,1] - —S—_ 3) 


From these imprecise bounds, we calculate the impre- 
cise BBAs mij(-) = [m,;;(-), mij(-)] which is the triplet 
of intervals (mij(Ai) = [mj,;(Ai), Maj(Ai VI, tray (Ay) = 
ina (Ae) ag( As, mg ( ALU As) = fm (ALU A) as(ACL 
A;)|) defined by: 


mig (Ai i Belj; (Ai) (34) 
Mig (A; U A;) 4 = Miz (OQ) => Pl; (A;) = Bel;; (Ai) (36) 


If a numerical (imprecise) value S;; is missing in S, one uses 
mi;(-) = ([0, 0], [0, 0], [1, 1]), ie. one takes the vacuous belief 
assignment expressed in its degenerate interval form. 

Using the formulae (28)-(36), we obtain from any M x N 
imprecise score matrix S the general M x N matrix M £ 
[gly = [m;;(-), mij(.)]] of imprecise BBAs that are 
necessary in Imp-BF-TOPSIS methods. 

It is worth to note that formulae (28)-(36) are fully consis- 
tent with (13)-(20) when all the elements 5;; of imprecise 
score matrix are degenerate (are precise numbers), that is 
when S;; = Sis, By choosing the midpoints of imprecise 
score values, we can always build a precise BBA that satisfies 
Shafer’s BBA definition [1]. This midpoint-based BBA is 
always included in imprecise BBA bounds because of IA. 
Therefore, imprecise BBAs are always admissible and they 
can be combined by Dempster’s or PCR6 rules thanks to IA 
operations. This has to be implemented with caution to avoid 
dependency effect [7]. 


B. Imp-BF-TOPSIS1 and Imp-BF-TOPSIS2 methods 


These methods are similar to BF-TOPSIS1 and BF- 
TOPSIS2 except that we use IA operations. The distances 
dgr(ma;, misst) and dgr(mi;, miypor') become imprecise num- 
bers computed wae formula (12) adapted for interval calculus, 
where miss'(A, ) 2 [1,1] and mont A, ) = [1,1]. Of course, 
all scalars involved in the formulae (12), (21), (22), and (25) 
must be expressed in their degenerate interval form in order to 
apply IA operations, for instance N, is replaced by [N., Ne], 
w,; by [w;,w,], etc. The final preference ordering is found 
according to the descending order of imprecise C'(A;, A>) 
obtained by the method explained at the end of section II-C, 
where a larger C(A;, AS‘) means a better alternative (higher 
preference). 


31f Adax = [0,0] then Bel;; 


O], and if A? , 


min 


Aj) = (0, = [0,0] then 


ig ( 
[1, 1], so that Bel;; (A;) [0, 0. 
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C. Imp-BF-TOPSIS3 and Imp-BF-TOPSIS4 methods 


These methods are similar to BF-TOPSIS3 and BF- 
TOPSIS4" but with special adaptation of PCR6 and ZPCR6 
formulae to reduce dependency effects with IA operations. 


For example, the expression rag mal) involved in PCR6 


formula [21] (Vol. 3) for the fusion of two BBAs must be 
computed as [—~+_~ + —,4-]-! with IA to get the 
: mi(X)ma(Y ) m3(X) . : : 

tightest range enclosure. The implementation of conjunctive 
rule must also be done with precaution when using IA to 


reduce the dependency effect in the derivation. 


VII. EXAMPLES 
A. Example 5 (mono-criterion) 


e Precise scores case [1]: Let’s consider a criterion C; 


and seven alternatives A;, (i = 1,...,7) with the precise 
score values Sj; = 10, So; = 20, S33 = —5, Sa, = 0, 
Ssi = 100, Sg, = —11, and S7, = 0. The direct ranking 


with the preference “greater is better” yields!’ As; > Ag > 
Ay > (Aq ~ Az) > Az > Ag. In applying formulas 
(13)-(20), we get the BBAs listed in Table I. Using BF- 
TOPSIS methods!®, we get the distances, and the relative 
closeness measures of Table II. In sorting C(A;, A>‘) by 
the descending order, we get the correct preferences order 
As > Ap > Ai > (Ag ~ Az) > Ag > Age which is consistent 
with the direct ranking result. 


Table I 
BBAS CONSTRUCTED FROM PRECISE SCORE VALUES. 


Table II 
DISTANCES AND RELATIVE CLOSENESS MEASURES. 


ost ) 


e Imprecise scores case: For simplicity, consider now the 
imprecise score values with midpoints consistent with previous 
example. For instance suppose Sj; = [8,12], So = [18, 22], 
S31 = [-7,—-3], Sar = [-1,1], S51 = [97,103], Ser = 


'4See their mathematical derivations in [1]. 
'Swhere the symbol > means better than (or is preferred to). 


!6in mono-criterion case, all BF-TOPSIS methods are equivalent because 
there is no need of making fusion. 
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[—12,—10], and S7; = [—1,1]. The offset factor is equal to 
6, = 12. After offset corrections with « = 1, we get the 
corrected positive imprecise scores S13; = [21,25], So, = 


[31,35], S31 = [6,10], Sar = [12,14], S5; = [110,116], 
Sei = [1,3], and S7; = [12,14]. In applying formulas (28)- 
(36), we get the imprecise BBAs listed in Table III. 


[0.0701,0.1234] 
[0.1452,0.2200] 
[0.0049,0.0161] 


[0.0179,0.0376] 
[0.9119,1.0000] 
[0,0] 


Table III 


[0.4375,0.6264] 
[0.3606,0.4885] 
[0.6538, 1.0000] 
[0.5769,0.8046] 
[0,0] 
[0.8365, 1.0000] 


IMPRECISE BBAS CONSTRUCTED FROM IMPRECISE SCORE VALUES. 


[0.2501,0.4924] 
[0.2915,0.4942] 
[0,0.3413] 
[0.1578,0.405 1] 
[0,0.088 1] 
[0,0.1635] 


[0.0179,0.0376] 


[0.5769,0.8046] — [0.1578,0.4051] 


As we see all imprecise BBAs values of Table III include 
precise BBAs values of Table I. Note that all negative bounds 
encountered in derivations (if any) are set to zero, and all 
bounds greater than one in derivations (if any) are set to one 
because masses values must belong to [0, 1]. Each imprecise 
BBA represented by a row of Table III is said admissible 
because for a given hypothesis A; one can find at least a point 
(a precise mass value) in each interval m;j;(A;), mij;(A;) and 
Mi; (Aj UA;) such that the sum of the masses equals one. If all 
imprecisions of scores reduce to zero, the results of Table II 
will coincide with results of Table I. Using Imp-BF-TOPSIS 
methods, we get the imprecise distances, and the imprecise 
relative closeness measures listed in Table IV. 


Table IV 
IMPRECISE DISTANCES AND RELATIVE CLOSENESS MEASURES. 
mort) C(Ay Abest) 
z) 


dpr(miz,m*") dar (mig, my 


[0.5421,0.9338] 
[0.5034,0.8313] 
[0.5324, 1.0000] 


[0.5960,0.9956] 
[0,0.1135] 
[0.6900,0.9533] 
[0.5960,0.9956] 


[0.0301,0.2864] 
[0.0511,0.3253] 
[0.0006,0.2977] 
[0.0131,0.2317] 
[0.7176,0.859 1] 
[0,0.1360] 
[0.0131,0.2317] 


[0.03 12,0.3457] 
[0.0579,0.3926] 
[0.0006,0.3586] 
[0.0130,0.2800] 
[0.8634, 1.0000] 
[ 0,0.1647] 
[0.0130,0.2800] 


For each element of C = {C(A;, A®'),i =1,..., M}, we 
compute its likelihood \; = \(C(A;, A°*')) to be the max of 
C by the method explained in section III-C. Here, one gets 


[A1, re, A3, Ya, As, \6; dz] 
~ [3.00, 3.57, 2.80, 2.29, 6.00, 1.02, 2.29] 


In sorting \; by the descending order, we get A5 > Ag > 
A; > Ag > Ag > (Ag ~ Az). This result is of course a bit 
different of what we obtain with precise midpoints of scores 
because of imprecision degree in the input scores, which is 
normal. However when the imprecision degree (i.e. the width 
of each score interval) of the input scores reduces to zero, we 
always obtain the same result as with (precise) midpoint of 
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score intervals because of the consistency of interval arithmetic 
operators with arithmetic on real numbers. 


B. Example 6 (Multi-criteria) 


An investor wants to invest some money in a company 
to get the highest profit. He considers four companies A = 
{A1, Az, A3, Aa} and must take a decision according to the 
following two criteria: C} is the risk analysis (the min is better) 
with weight w; = 0.6 and C4 is the growth analysis (the max 
is better) with wa = 0.4. Assume the imprecise scores are 


Cy(in %) Co(in %) 

Af [8,12] (8, 10] 

g— 42| [17,19] [15,19] 
~ As! [3,5] (8, 12] 
Ag [4, 8] [5, 7] 


We get the final preference orderings with: 
e Imp-BF-TOPSIS1: we get A3 > Aq > A, > Ag because 


A1, A2, A3, Aa] & [1.4939, 1.0013, 1.9396, 1.5652 
e Imp-BF-TOPSIS2: 
Ay, A2,A3, A4] & 
e Imp-BF-TOPSIS3: 
M1, A2,A3, A4] © 
e Imp-BF-TOPSIS4: we get Ay > A3 > A, > Ag because 
1, A2, Ag, Ag] & (1.5093, 1.4686, 1.5101, 1.5120 


Imp-BF-TOPSIS 1-3 methods provide here the same pref- 
erence order A3 > Ay > Ag > Aj, hence A3 is the 
preferred choice. One sees that Imp-BF-TOPSIS3 and Imp- 
BF-TOPSIS4 have difficulties to provide very distinct likeli- 
hood values because unsurmountable dependency effects arise 
in IA operations when applying PCR6 and ZPCR6 rules 
which degrade substantially the final precision of the result. 
Based on this analysis, we recommend to use either Imp-BF- 
TOPSIS lor Imp-BF-TOPSIS2 because they provide tightest 
enclosure results and they are much simpler to implement than 
Imp-BF-TOPSIS3 and Imp-BF-TOPSIS4. 


we get A3 > Ay > A, > Ap because 
1.4968, 0.9323, 2.0535, 1.5175 
we get A3 > Ay > A, > Ap because 
1.5300, 1.3487, 1.5639, 1.5574 


VIII. CONCLUSIONS 


Four new methods (Imp-BF-TOPSIS 1—Imp-BF-TOPSIS4) 
for MCDM have been proposed. We have shown how to 
calculate imprecise BBAs from imprecise scores, and how to 
evaluate the relative imprecise closeness of each alternative to 
the ideal best and worst solutions for making the preference 
ordering. These methods avoid scores normalization, and they 
can deal with imprecise scores, with missing scores, with the 
reliability of the sources as well, and they could also work with 
imprecise weightings of criteria. They are more complicate to 
implement (and slower) that their precise counterparts because 
of IA. They are consistent with BF-TOPSISI-4 when the 
imprecision of scores reduces to zero. However because IA 
suffers of dependency effects, IA is not the universal panacea 
to work with imprecise values to get best results, specially 


for combining imprecise BBAs. More research efforts need to 
be done to circumvent these problems (if possible) by better 
implementations (or by Monte-Carlo approach) in order to 
improve the performance of Imp-BF-TOPSIS3 and Imp-BF- 
TOPSIS4 methods. Application of these methods for natural 
risk assessment in mountains is under development and will 
be reported in future publications. 
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Abstract—Multimodal images encompass diverse information 
that can be both complementary and redundant, thereby ad- 
dressing the challenges associated with unimodal classification. 
By modeling and integrating these different pieces of informa- 
tion, multimodal approaches offer improved solutions. However, 
despite yielding acceptable results, this classification approach 
still falls short of the level achieved by the human brain’s 
powerful mechanism for effortlessly classifying visually observed 
scenes. To enhance the classification task within the realm of 
multimodal images, we present a methodology that leverages the 
Dezert-Smarandache Theory (DSmT). This approach involves 
fusing spectral and dense SURF features extracted from each 
modality, which are then pre-classified by the SVM classifier. 
Additionally, we incorporate the visual perception model into 
the fusion process. In order to demonstrate the effectiveness of 
incorporating salient features into the fusion process using DSmT, 
we conducted tests and validation on extensive datasets obtained 
from cultural heritage wall paintings. Each dataset consists of 
four imaging modalities, namely UV, IR, Visible, and fluorescence. 
The results obtained from these experiments show great promise. 


Keywords: Visual saliency model, fusion, DSmT, SVM, 
Dense SURF features, Spectral features, Multimodal images, 
Classification. 


I. INTRODUCTION 


Nowadays, multimodal imaging has gained increasing im- 
portance in computer vision application, and significant efforts 
have been put into developing methods of different tasks, 
such as Registration [1]-[4], Data fusion [5], Representation 
learning [6], Classification [7] and so on. In classification task, 
the unimodal image presents various problems as noisy data, 
incomplete information and distorted ones, etc. This often led 
to a misclassification. These limitations are overcome by using 
multimodal images, which are acquired from multiple sensors, 
and taken for the same object or scene. Each image or modality 
allows to provide different information that can sometimes 
be redundant, because the same area/scene is presented in a 
different sensor, and complementary for another modality, re- 
garding the diversity of sensor technologies and their physical 
interaction mechanism. The use of this set of images together 
presents a real-world benefit to resolve a given problem with 
some various available information. The fusion of these data 
form a better quality classification. 


However, these data are crippled with some imperfections 
such as conflict, ignorance, uncertainty and so on, which must 
be handled and taken into account by dedicated formalism 
as long as they present an aspect of reality. To fix such 
problem, several formalism exist as probability theory [8], 
Fuzzy theory [9], belief function formalism [10] and Dezert- 
Smarandache (DSmT) formalism [11], [12]. In this work, we 
benefit from the latest theory which is the most recent one, 
and it was introduced in order to deal with the high conflicted 
and uncertainty data thanks to its rich modelization and the 
combination operators (PCRS and PCR6) that it integrates. 

In classification task, belief function theory is widely ex- 
ploited in many works [13]-[16]. Whereas DSmT or so-called 
plausible and paradoxical reasoning shows its efficiency in 
many applications, it was performed for multi-source remote 
sensing application [17] for supervised classification purpose 
by integrating contextual information obtained from ICM 
classifier with constraint and temporal information in hybrid 
DSmT process with adaptive decision rule, the authors also 
proposed a new decision rule based on DSmP transformation 
for change detection purpose [18]. In [19], the authors present 
an effective use of DSmT for multi-class classification by 
combining two SVM OAA (One-Against-All) implementation 
using PCR6 combination rule. A new method, based on fusing 
the attribute type information obtained from Ground Moving 
Target Indicator and imagery sensor using DSmT for tracking 
and classification, has been presented in [20]. Multidate fusion 
has been proposed in [21], [22] for the short-term prediction of 
the winter land cover. DSmT is also used in the medical case 
retrieval by [23], the authors used DSmT to fuse heterogeneous 
features of several sensors which will be included in CBR 
systems. 

According to our study of the state of the art, all studied 
research works disregard the power of perceptual attention to 
well classify any scene thanks to the high human brain capaci- 
ties. We benefit from this ability in our approach by integrating 
the visual perception model, using DSmT, with spectral and 
dense SURF features obtained from SVM classification for 
significant classification improvement. 

The paper is organized as follows. After a brief presentation 
of mathematical background of DSmT formalism in section 
II, we present the overall system of the proposed method in 
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section HI. Data and experiments are then given in section IV 
in order to evaluate the performance of our approach on real 
image datasets. A conclusion is given in section V. 


Il. MATHEMATICAL BACKGROUND OF DSMT 


Dezert-Smarandache theory was proposed jointly by Jean 
Dezert and Florentin Smarandache [24], and was an attempt 
to overcome belief function limitations by handling a high 
uncertainty and conflicting information. This theory can be 
described as follows: 

We denote 0 = {61,62,...,9n} the discernment space of 
the N class classification problem, and D®© the hyper-power- 
set [25] that is the set of subsets of ©, with the union of 
classes and also their intersection, so that if X,Y € D®, then 
XUY € D® and XNY € D®. Each source S; contributes its 
belief mass m; to X, known by the generalized basic belief 
assignment (gbba) step, and satisfying following properties: 


mi(X) : D® > [0,1], (1) 


mi(0) = 0, (2) 


S> m,(X) = 1. (3) 


The size of hyper-power-set presents a real limit in DSmT 
when N >6 (N number of classes) in Free model [26] 
which corresponds to the full hyper-power-set without any 
constraints, in contrary to hybrid model [26] which allows 
integrating constraints that can be exclusive and refined, and 
therefore minimizing D® size. The assigned generalist mass 
obtained from different sources are then combined and a 
new mass distribution is provided to D® elements. Com- 
bination step presents the kernel of the fusion process and 
each formalism proposed several combination operators. In 
DSmT formalism, all combination operators can be found 
in detail in [27], we quote the most used as Smets rule, 
Dempster-Shafer (normalized) operator, Yager operator, Zhang 
operator, DSmH rule, Dubois and Prade rule, PCRS operator 
for N =2 and PCR6 operator for N > 2. To deal with a 
large number of the sources used in this work and the high 
uncertainty and conflicting information provided, we benefit 
from the performance of PCR6 combination rule in handling 
such problem. 

The generalized belief functions Credibility noted Bel(.) 
or Cr(.), Plausibility noted Pl(.), and DSmP transformation 
are derived from the function of basic mass and respectively 
defined for D® in [0, 1]: 


d= mz) (4) 


Ss) m(2) (5) 


DSmP.(0) = 0, and VX € G® \ {0} 
Yizcxny mM(Z)+e-C(XNY) 


- C(Z)=1 
ROUEN a zcy m(Z)+e-C(Y) a 
YEG C(Z)=1 6 


where G® can present full D®, or reduced D® with constraint, 
depending on the model used (free, or hybrid). € is an 
adjustment parameter, C(X MY) and C(Y) are respectively 
the cardinality of X MY and Y. 

The last step in DSmT process is making a final decision, 
which presents a real challenge in many applications. In this 
work, we are interested in improving classification, we have 
to take a decision about pixels’ belonging to a simple class 
also called Singleton class, and in this case there are two ways: 
taking decision based on maximum of generalized basic belief 
mass (gbba), or based on generalized belief function already 
computed as follows: 

e Maximum of credibility Cr(.) is widely used in many 
applications [28], and it is considered as a pessimistic 
decision. 

e Maximum of plausibility Pl(.) which is considered as an 
optimistic decision. 

e Maximum of DSmP that is a compromise decision 
between the above decisions which are based on us- 
ing probabilistic transformation P(.) in the interval of 


Cr(.), Pl(.)]. 


II. OVERALL SYSTEM 
A. Pre-processing 


Generally, the pre-processing that precedes classification 
aims to eliminate imperfections that taint information by a 
set of actions as filtering, gradient operations, etc. However, 
in the classification based on the theories of the uncertain, 
these imperfections are protected, modeled and combined to 
help to make a decision. 

The registration is the usually used pre-processing in the 
fusion process, it aims at setting correspondence between two 
or more images of a scene obtained from one or various 
sensors potentially at different spatial positions and scales, 
by using an optimal spatial and radiometric transformations 
between the images. 

In the case of multimodal images, registration is an issue 
because of the significant difference between images [29], 
[30]. An original methodology was proposed in a previous 
work to answer the particular issue of the registration with 
multimodal imaging inputs in which we exploit the SURF 
scale- and rotation-invariant descriptors for the identification 
and the description of the interest points and we introduce a 
relevance filtering based on both SURF distance and orienta- 
tion features in matching step [1]. 


B. Feature Extraction 


Feature extraction is a pivotal step in the classification 
process. It aims to underline the relevant features that are 
correspondent to various classes. It is worth stating that the 
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appropriate choice of extracted features improves the perfor- 
mance of classification step. Spectral, Spatial and perceptual 
features are extracted in this work. 

1) Spectral Information: The spectral information is widely 
used on large classification methods. In this work, we have 
extracted the spectral values of each pixel as a vector of 
attributes and then converted them to Cielab space model for 
a better correlation with human color processing. 

2) Dense SURF Description: Speeded up robust feature 
(SURF) proposed by Herbert Bay [31] is a spatial descriptor 
which consists originally of two phases, Detection and de- 
scription of keypoints. We proposed in a previous work [32] to 
skip the detection phase and to perform description one to each 
pixel in the image. This is done, at the first by assigning to each 
pixel the dominant orientation calculated by combining the 
Haar wavelets results within a circular neighborhood around 
each pixel, and then creating 4x 4 sub regions around the pixel. 
In each sub-region, a pixel wise Haar wavelets responses are 
computed, which in turn are summed up to form 64-elements 
descriptor. 

3) Saliency Information: Based on a performed compara- 
tive analysis of saliency detection in our multimodal data [33], 
we extract the saliency features by using the method proposed 
by Rahtu et al [ [34]. This method used local features contrast 
in illuminance, color mapped to feature space F(x) that is 
divided into disjoint bins. A saliency measure is calculated 
by applying a sliding windows w divided into inner windows 
4 and border B in which a hypothesis that points in kK are 
salient and points in B are not, the measure can be defined 
as probability conditional and computed through the Bayes 
Formula as 


= hi (x)po 
hg («)po + hp(«)(1 — po)’ 


with 0 < pp < 1, and hp(x) = P(F(«)|H1). A regularized 
saliency measure is then introduced to make it more robust to 
the noise. 

The motivation of integrating saliency information in the fu- 
sion process is the fact that usually visual perception succeeds 
easily to classify any objet or scene. 


So(2) (7) 


C. SVM Pre-Classification 


Support vector machine is a supervised classification 
method introduced by Vapnik [35], [36], widely used in 
classification applications thanks to its performance to deal 
with high-dimensional data. Basically, it is designed for binary 
class by finding an optimal hyperplan that separates the two 
classes linearly-separated. In non-linear separable class, the 
feature space is mapped to some higher dimensional feature 
space where the classes are separable using a Kernel function 
K that should fulfill Mercers conditions, the most kernels used 
are Radial Basis Function RBF, in which the decision function 
is expressed as a flow 


N 


h(a) = Sign(} > a; exp{—|x— 2i|?/(2-07}), (8) 


i=l 


where a; are Lagrange multipliers, and the associated Kernel 
function is 


kia) ae (9) 


In case of multiclass problem, two main approaches were pro- 
posed: One-Versus-Rest approach in which k binary classifiers 
SVM, are constructed for k-class classification, and One- 
versus-One in which (4 (k—1))/2 binary classifiers are applied 
on each pair of classes. 


In order to generate the probabilities for DSmT, we have 
performed a pre-classification [32] based on combining spec- 
tral information (see III-B1) and Dense SURF information (see 
Il-B2) using SVM classifier with RBF kernel to handle non- 
linear high-dimensional data in our multimodal dataset, and 
One-Versus-Rest approach to deal with incomplete informa- 
tion provided from diverse modalities. 


D. DSmT Classification 


1) Mass function estimation: Mass estimation function step 
is very crucial in fusion process, because the imperfections 
such as uncertainty, imprecision, paradox will be introduced. 
The most generation used for these masses is the proba- 
bilities from pre-classification. The SVM classification of k 
images generates the matrices of the probabilities P(xs5|0;) 
of pixels belonging to the singleton class of the frame of 
discernment O = {61,02,...,9n}, the same for k saliency 
map generated using the proposed method in [34]. Each source 
(modality/saliency map) noted S$? (i = 1,...,.K) gives the 
probability of belonging to one, or two classes, and their 
complementary classes which presents the mass of the partial 


ignorance. Based on [19], we denote 0 = {61,62,...,4n}, 
and the gbba mass of each source is given by: 
ms(6;) = 72) vo, € 0, 
P(x|Ug<j<n9s) 
ms(0;) = #0; € ©, ee 
mg(0) =0 


where z = ae, P(az|6;) is a normalization term used in 
order to make sure that the sum of masses is equal to one. 


2) Combination of masses and decision: The estimated 
masses must be combined with appropriate rules that handles 
the conflict generated from different sources S?. In this work, 
we have used PCR6 [37] rule in combination step because it 
shows a better performance compared with all combination 
rules cited in the previous section and tested on our datasets. 
The PCR6 is computed as follows: 


Considering S independent sources, the combined 
mpc re(-) masses acquired from S > 2 sources are computed 
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as mpore(:) = 0, and VX € D® \ {9} as 


mpcre(X) = my42...5(X) 


a. he 


X1,X9,...,Xs€D?\{O} 
XiNXan...nXs=0 


S 
6x m,(X, m,(X1)mo(X2)...mg(Xs) 
ID x, 1% ( ig X) +m(X2)+...+ms(Xs) 
(11) 
where 
x a JLiix=X,, 
= ee 2, (12) 


and where the mass 7m 9...5(X) corresponds to the conjunctive 
consensus on X between N > 2 sources, which is given my 


» 


X1,X2,...,X3€D?\{0} 
X1{NX9_N..NXg=X 


my2...5(X) = mi (X1)m2(X2)...ms(Xs) 


Once the combination step is achieved, we calculate the 
generalized belief function and we use a probabilistic trans- 
formation DSmP that converts the combined masses measure 
to a probability measure using (6) to make a final decision. 


IV. DATA AND EXPERIMENTS 
A. Data 


Large sets of multimodal images acquired on wall paintings 
from the Germolles palace are used to demonstrate our pro- 
posed method. This palace was offered by Dukes of Burgundy 
Philip to his wife Margaret Flanders in 1380, and it was 
the only remaining castle of the Dukes of Burgundy so well 
preserved, its wall painting was restored between 1989 and 
1991. However, there were no conservation reports of the 
applied restoration. In order to detect the original from restored 
area, the conservator of Germolles used the multimodal images 
that have the advantage of being fast and relatively inexpensive 
solution for the examination of large areas of wall paintings. 
This technical photography consists of recording a set of 
images with a commercial digital photographic camera which 
has been modified by removing the thermal filter regularly 
positioned in front of the CCD. In this way it is possible 
to record images of reflected visible light (Vis), reflected 
infrared light (IRr), reflected ultraviolet light (UVr) and UV- 
fluorescence (UVf). This set of images provides information 
about the optical behaviour of the surface when reached by 
the different types of light and therefore provides information 
about the original portions of wall paintings from recent 
repainting. 

For illustration purpose, we select an area of a south wall 
of the dressing room of Margaret represented in Figure 1. 
This area presents a large white P (for Philip) that covers 
the walls and painted in green, which is presented by four 
modalities VIS, UVF, UVR and IRR. Each modality measures 
3744 x 5616 pixels. IRR modality shows very well the parts 
over non-original green surface. The image of the UV-induced 


Figure 1. Multi-modal images of the same area. 


fluorescence modality shows a relatively strong fluorescence 
corresponding to remains of an old/original paint layer over 
the white. The UVR image helps to identify the repainting 
original over the white of the letter P. 


B. Experiments 


The adopted methodology can be divided into four steps as 
illustrated in figure 2, which is started with the preprocessing 
by aligning each image with the VIS image that is used as a 
reference image. 


Registration 


Feature Extraction 


Dense-SURF 
information 


l | 


Combination 


SVM - Classification 


Spectral information Saliency information 


Mass Function Estimation 
RCR6 Combination Rule 


DsmP Decision Rule 


Final Image Classified 


Figure 2. A representative illustration of the workflow. 


In the second step, four topics have been identified: White 
original (WO), White repainted (WR), Green original (GO) 
and Green repainted (GR). Then spectral and Dense-SURF 
information is extracted and used jointly as the entry of the 
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SVM classifier using the RBF kernel. In parallel, Saliency 
information is extracted using the proposed method in [34], 
the provided maps are shown in figure 3. 


Figure 3. Saliency maps. 


The third step is pre-classification using the SVM classifier 
that is applied to the images, in order to recover the probability 
matrixes of pixels belonging to classes. Each used modality 
highlights the presence of one or two classes. The UV-induced 
fluorescence modality shows a relatively strong fluorescence 
corresponding to the remains of an old painted layer of the 
white (WO) that reaches an accuracy of 92% using SVM, 
also UVR modality emphasizes WO class with a classification 
accuracy of 98%. Infrared light shows very well the parts 
over the original and repainted surface of the green and gets 
accuracy of 94% [32]. The provided maps are presented in 
figure 4. 


UVF UVR IRR 


Legend i i i 


wo WR GO GR 


Figure 4. Multimodal SVM Classification. 


The VIS modality reaches an accuracy of 98% with the 
classification of the two classes GO and GR, whereas this 
precision is reduced when classifying four classes because of 
the increase of the conflict. The classified image is presented 
in figure 5. 

The last step presents the fusion process that is started with 
defining the frame of discernment 0 = {WO, WR, GO, GR}. 
Due to the obtained information by SVM classification and 
saliency maps, there are some constraints that can be taken 
into account to deal with the real situation and to reduce the 
hyper power set D®, for example WOM GR = 0. 


Legend 


Figure 5. SVM classification of VIS modality. 


Then the mass function that is associated with the em- 
phasized class and it’s complementary in each modality are 
computed using equation (10). The PCR6 combination rule 
is used for combining the calculated masses basing on the 
equation (11), and as a final task, the decision is taken using 
maximum DsmP. 

The final classified map, provided by DSmT only, is given 
in figure 6, and the final classified map obtained using DSmT- 
Salience is shown in figure 7. 

The results have progressed with the integration of the 
perceptual model in DSmT process, the visual analysis of 
the classification maps shows that the result of the proposed 
method much better with the ground truth over the WR and 
WO classes and appears to be closer to the reality, rather 
than the result obtained using DSmT only for the same 
classes, while the obtained map using unimodal image present 
a degraded result in terms of smoothness and connectivity 
between classes. 

In this work, in order to evaluate the performance of the 
used methods and to compare the results, we have used the 
Overall accuracy (OA) that presents a percentage of correctly 
classified pixels, and Mean Error Rate (MER) that presents 
the percentage of misclassified pixels. Table I summarizes the 
obtained results using the different methods, from the results, 
we can note that the proposed method produces a better overall 
accuracy of 95.39% compared with the DSmT classification 
which provides an overall accuracy of 91.46% and the SVM 
classification that gives an overall accuracy of 86.43%, in 
terms of the error rate, the proposed method gives the low 
MER score of 4.61% compared with DSmT-Classification 
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Legend 


Figure 6. DSmT classification of multimodal images. 


Legend 


Figure 7. DSmT- Salience classification of multimodal images. 


and SVM-Classification that provides a MER of 8.53% and 
12.60% respectively. 

In conclusion, the use of DSmT theory with PCR6 com- 
bination rule provides a better result thanks to its effective- 
ness in managing correctly the conflict information that is 
provided from the different sources, and shows a significant 


classification improvement compared with the unimodal SVM 
classification. Thus, the integration of saliency information in 
the fusion process presents a real benefit due to the powerful 
mechanism of the human brain in classification tasks. 


Methods OA MER | 
SVM-Classification 86.43 % | 12.60 % 
DSmT-Classification 91.46 % 8.53 % 
DSmT-Salience-Classification | 95.39 % 4.61 % 
Table T 
ACCURACY AND ERRORS OF CLASSIFICATION RESULTS FROM DIFFERENT 
METHODS. 


V. CONCLUSION 


In this paper, we have proposed a new method for multi- 
modal image classification. As a first step, we have extracted 
spatial (Dense-SURF), spectral and saliency information. The 
extracted spatial and spectral information are combined and 
passing to the classifier SVM for pre-classification step. The 
SVM.-classification results that are obtained from each modal- 
ity is then fused using DSmT theory, the use of DSmT 
and SVM jointly provides better performance compared with 
the unimodal SVM classification. In the second step, the 
extracted saliency information is then modeled and combined 
with SVM classification results using DSmT process based on 
PCR6 combination rule and DSmP decision rule, the proposed 
method yields the best performance in terms of accuracy 
and error rate compared with DSmT-SVM classification and 
unimodal SVM classification. 
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Abstract—The study of alternative probabilistic transformation 
(PT) in DS theory has emerged recently as an_ interesting 
topic, especially in decision making applications. These recent 
studies have mainly focused on investigating various schemes for 
assigning both the mass of compound focal elements to each 
singleton in order to obtain Bayesian belief function for real- 
world decision making problems. In this paper, work by us also 
takes inspiration from both Bayesian transformation camps, with 
a novel evolutionary-based probabilistic transformation (EPT) to 
select the qualified Bayesian belief function with the maximum 
value of probabilistic information content (PIC) benefiting from 
the global optimizing capabilities of evolutionary algorithms. 
Verification of EPT is carried out by testing it on a set of 
numerical examples on 4D frames. On each problem instance, 
comparisons are made between the novel method and those exist- 
ing approaches, which illustrate the superiority of the proposed 
method in this paper. Moreover, a simple constraint-handling 
strategy with EPT is proposed to tackle target type tracking 
(TTT) problem, simulation results of the constrained EPT on 
TTT problem prove the rationality of this modification. 


Keywords: Evidence Reasoning, Probabilistic Transforma- 
tion, Evolutionary Algorithm, Target Type Tracking problems, 
Decision Making. 


I. INTRODUCTION 


Since the pioneering work of Dempster and Shafer [1], [2], 
which is known as Dempster-Shafer evidence theory (DST), 
in the late 70’s regarding the possibility of distinguishing 
“unknown” and “imprecision” and fusing different evidences 
based on associative and commutative Dempster’s combination 
rule, this new area of research (now known as evidence 
reasoning) has grown considerably as indicated by the no- 
table increment of technical papers in peer-reviewed journals, 
conference and special sessions. However, the computational 
complexity of reasoning with DST is one of the major points 
of criticism this formalism has to face. 

To overcome this difficulty, various approximating methods 
have been suggested that aim at reducing the number of focal 
elements in the frame of discernment (FoD) so as to maintain 
the tractability of computation computation. One common 
strategy is to simplify FoD by removing and/or aggregating 
focal elements for approximating original belief funcion [3]. 
Among these methods, probability transformations (PTs) seem 
particularly desirable for reducing such computation complex- 
ity by means of assigning the mass of non-singleton elements 
to each singleton [4], [5]. The research on this probabilistic 
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measure has received a lot of attentions and accordingly many 
efficient PTs have been pointed out by scholars in recent 
years. In them, a classical transformation, denoted as BetP 
[4], which offers a good compromise between the maximum 
of credibility (Bel) and the maximum of plausibility (Pl) for 
decision making. Unfortunately, BetP does not provide the 
highest probabilistic information content (PIC) [7]; Sudano 
[8] also proposed series of alternatives and principles of 
these similar to BetP, which were called PrPl, PrBel and 
PrHyb; CuzzP [9], which was proposed by Cuzzolin in the 
framework of DST in 2009, showed its ability of probabilistic 
transformation; Another novel transformation was proposed 
by Dezert and Smarandache in the framework of DSmT 
(free DSm model, hybrid DSm model or Shafer’s model), 
which was called DSmP [7] and comprehensive comparisons 
have been made in [7] to prove the capabilities of DSmP in 
probabilistic transformation. 

However, most mentioned aforementioned PTs have been 
always concentrated mainly on two crucial issues: (1) How to 
implement this operation (or assignment)? (2) How to evaluate 
the quality of this transformation? In this paper, we suggest 
a novel PT method based on evolutionary algorithm, namely, 
evolutionary-based probabilistic transformation (EPT), which 
alleviates the above two difficulties together based on op- 
timization using a reasonable criteria. A similar idea was 
proposed by Han et.al [10] and the difference lies in the 
optimization approaches and objective functions. In the EPT 
method, the global search replaces the assigning operator used 
in the classical PTs and the evaluation criteria is embedded into 
EPT to provide important guidance for the searching proce- 
dure. Specifically, the mass of the singletons are randomly 
generated in evolutionary-based framework, which need to 
satisfy the constraints of probability distributions in evidence 
reasoning. Also, a selection operator is presented to assess the 
best individual in all populations by a special objective func- 
tion (desirable evaluation criteria). Referring to the previous 
works [7], the PIC is used in this paper to select the best! 
solution as an objective function in EPT. Simulation results 
on 4D frames test cases show that the proposed EPT, in these 
problems, is able to outperform other PTs that pay special 
attention to some ratio created from the available information 


‘based on the highest PIC value. 
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(i.e. Bel or Pl). Moreover, we suggest a simple constraint- 
handling strategy with EPT that suits well for two target type 
tracking (TTT) problems. These first appealing results of EPT 
method encourage its use for more complex and real-world 
decision making problems. 

The rest of this paper is organized as follows. In Section II 
we briefly summarize the basis of DST and several classical 
PT formulas. A novel EPT approach is presented in details 
in Section II. In Section IV several cases and comprehensive 
comparisons borrowed from previous papers are carried out 
to demonstrate the superiority of proposed method. Also, 
target type tracking problem and the pertinent analysis of 
EPT in TTT are described in detail in this section. Moreover, 
the limitation of EPT are also discussed in Section. V. The 
conclusion is drawn in Section. VI. 


II. BASIS OF BELIEF FUNCTIONS 


In this section, we introduce the belief functions terminol- 
ogy of DST and the notations used in the sequel of this paper. 


A. DST basis 


In DST [2], the elements 0; (¢ = 1,...,.N) of the frame 
of discernment (FoD) 0 = {6,,...,@n} must be mutually 
exhaustive and exclusive. The power set of the FoD is denoted 
2° and a basic belief assignment (BBA), also called a mass 
function, is defined by the mapping: 2° — [0,1], which 
satisfies m(0) = 0 and 


Yo maa (1) 
AC2° 


where m(A) is defined as the BBA of A. The element A is 
called a focal element of m/(.) if m(A) > 0. The belief and 
plausibility functions, which are in one-to-one mapping with 
the BBA m(.), are defined for all A C © by 


Bel(A) = S~ m(B) (2) 
BCA 
PU(A) =1- Bel(A)= S> m(B),VACO (3) 
ANB#AO 


where A = © \ A is the complement of A in O. The belief 
interval [Bel(A), PI(A)] represents the uncertainty committed 
to A and the bounds of this interval are usually interpreted as 
lower and upper bounds of the unknown (possibly subjective) 
probability of A. This interval plays an important role in the 
implementation of EPT as shown in details in Section II. 


B. DSmT basis 


In the framework of Dezert-Smarandache Theory (DSmT) 
[5], the FoD © is considered as a finite set of N exhaustive 
elements only (without the requirement of exclusivity of the 
elements). The BBA m(.) is then defined on the hyper-power 
set of the FoD (i.e. the free Dedekind’s lattice D®), taking 
eventually into account some integrity constraints (if any). The 
main differences between DST and DSmT frameworks are: (1) 
the model on which one works with, and (2) the combination 


rule. In the sequel, we will work with BBA defined only on 
the classical power-set for simplicity. Instead of distributing 
equally total conflicting mass onto elements of 2° as within 
Dempster’s rule through the normalization step, or transferring 
the partial conflicts onto partial uncertainties as within DSmH 
rule [4], we use the Proportional Conflict Redistribution rules 
(PCRs) [5] based on the transfer of conflicting masses (total 
or partial) proportionally to non-empty sets involved in the 
model according to all integrity constraints. In DSmT, the most 
effective rule is the PCR6 rule which is defined” for the fusion 
of two BBA’s m1(.) and mo(.) as mpore(O) = 0 and VA € 


° \ {0} 


mpcre(A) = m42(A)+ 
ma(A)?ma(B)_,_ms(A)?mi(B) 
2 aaa) mal aa) 


Be2°\{A}|ANB=0 

(4) 

where ™12(A) is the conjunctive operator, and each element 
A and B are expressed in their disjunctive normal form. 


C. Classical Probabilistic Transformations 


The efficiency of probabilistic transformation (PT) in the 
field of decision making has been analyzed in deep by Smets 
[4]. Various PTs have been proposed in the open literature and 
the main transformations are briefly recalled in this section. 

1) BetP: Smets in [4], [6] first proposed pignistic proba- 
bility to make decision which aims to transfer the mass of 
belief of each non-specific element onto the singletons. The 
classical pignistic probability is defined as BetP(Q) = 0, and 
VA € 2° \ {O}: 


16, VA] m(A) 
|A] 1—m() 


BetP(0;) = 


» 


AC2°, AAO 


(5) 


Because in Shafer’s framework m() = 0, the formula (5) can 
simply be rewritten for any singleton 6; € O as 


BetP(@;) = 


1 
S> me 


Be2°,0;CB 


2) CuzzP: An intersection probability denoted as CuzzP 
[9] was proposed using the proportional repartition of the 
total non-specific mass (total non-specific mass (TNSM) = 
ae2e,)4) (A)) for each contribution of the non-specific 
masses involved. CuzzP is defined by CuzzP(@) = 0, and for 
any singleton 0; € O by 


\ . PU6,) — (04) 
m(6:) + = Cera — m(6;)) 


?PCR6 rule coincides with PCRS rule when combining only two BBA’s 
[5]. 


CuzzP(6;) = -TNSM (7) 
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3) DSmP: In 2008, Dezert and Smarandache [7] have 
proposed a new generalized pignistic transformation defined 
by DSmP-(@) = 0 and for any singleton 6; € © by 


DSmP-(;) & m(6;) + (m(0;) + é) 


x = 


AE€2° OnCA,|A|>2 


m(A) 
Do Be2°,BCA,|B|=1 AB) + €- [Al 


(8) 


As shown in [7], DSmP makes a remarkable improvement 
compared with BetP, and CuzzP, since a more judicious 
redistribution of the ignorance masses to the singletons have 
been adopted by DSmP. 

4) PrBP1 and PrBP2: Two novel pignistic probabilistic 
transformations were proposed by Pan in [11], which assume 
that the BBA is proportional to the product of Bel(6;) and 
P1(6;) among each singleton element 6; of A C O. 


wea Bel(6;) P1(9;) 
Also, Pan et.al. assume that the masses are distributed propor- 


tionally to some given parameters s; = Bel(0;)/(1 — PI(6;)) 


PrBPi(g;)= >- ( -m(A) (9) 
0;CA 


PrBP%6,)= S> | =~ — 
AOiCA 25,8;C4 5 


As we can see, a Bayesian mass function which has only 
singleton focal elements can be obtained by any of these PTs. 


-m(A) (10) 


D. Probabilistic Information Content (PIC) 


The PIC criterion [12] is classically adopted to evaluate 
the performances of a probabilistic transformation of a BBA. 
If m(.) is a Bayesian BBA defined on a discrete finite FoD 


© = {0,,02,..., Oy}, its PIC value is defined as? 
; a 
Ay * = 
PIC(m) =1+ be 2 m(6;) logs m(4;) (11) 


The PIC metric actually measures the information content 
of a (probabilistic) source characterized by a Bayesian BBA 
m(.), which the value of this metric always belong to [0; 1]. 
It corresponds to the (normalized) dual of Shannon’s entropy 
measure. When the Bayesian BBA is uniform over the FoD 0, 
one has m(6;) = 1/N for 1 = 1,2,...,.N and the PIC metric 
is minimum, i.e. PIC(m) = PIC pin = 0. The PIC metric is 
maximum, ie. PIC(m) = PIC max = 1 if the Bayesian BBA 
m(.) is deterministic, that is if there exists an element 0; of O 
such that m(@;) = 1. While simple, appealing and generally 
adopted by the community, the PIC criteria is however not 
always sufficient to evaluate the efficiency of a PT as discussed 
in [14]. This point will be addressed in Section V. 


3where 0log.(0) = 0 by convention. 


III. EVOLUTIONARY-BASED PROBABILISTIC 
TRANSFORMATION (EPT) 


The idea to approximate any BBA into a Bayesian BBA 
(i.e. a subjective probability measure) using the minimization 
of the Shannon entropy under compatibility constraints has 
been proposed recently by Han et al. in [10], [14] using “on- 
the-shelf” optimization techniques. In this paper, we present in 
details a new optimization method to achieve this PT based on 
a random evolutionary algorithm to acquire maximization of 
the PIC value. That is why we call it the Evolutionary-based 
Probabilistic Transformation (EPT) method. 

Let’s assume that the FoD of the original BBA m/(.) to 
approximate by a Bayesian BBA is 0 © {6),00,..., 0}. 
The EPT method consists of the following steps: 

e Step 0 (setting parameters): trax is the max number of 

iterations; Nmax is the population size in each iteration; 
P.. is the crossover probability, and P,,, is the mutation 
probability. 

e Step | (population generation): A set P; of 7 = 
1,2,...,%max random probability values P} = 
{P/(0,),...,P4(On)} is generated such that the con- 
straints (12)-(14) for 7 = 1,2,...,mmax are satisfied 
in order to make each random set of probabilities P3 
compatible with the original BBA m(.) 


P1(6;) € [0;1], #=1,2,...,N (12) 
N 
Peat (13) 
i=1 
Bel(6;) < P3(0;) < PU(6;), i=1,2,...,N (14) 


e Step 2 (fitness assignment): To each probability set P}, 


(j = 1,2,...,Mmax), we compute its PIC value and use it 
as its fitness factor F’. More precisely, one takes F'(P/) = 
PIC(P?). 


e Step 3 (best approximation of m(.)): the best set of 
probability P7* with highest PIC value is sought, 
and its associated index jpest are stored respectively in 
*Best-Individual” and *Index-of-BestIndividual”. 

e Step 4 (selection, crossover and mutation): The tourna- 
ment selection, crossover and mutation operators drawn 
from evolutionary theory framework [13] are imple- 
mented to create the associated offspring population 
P/, based on the parent population P;. If F(Pj™) > 
F(P/"), then the ’Best-Individual” remains unchanged; 
otherwise, Best-Individual = Py’. 

e Step 5 (Stopping EPT): The steps 1-4 illustrate the ¢-th 
iteration of EPT method. If t > tax then EPT method is 
completed, otherwise another iteration must be done by 
taking t+ 1 =t and going back to step 1. 

The scheme of EPT method is shown in Fig.1 and its 

pseudo-code is given in Algorithm 1. 


IV. SIMULATION RESULTS 


In this section we compare EPT’s results to other mentioned 
PTs. In particular, we show the important gain of PIC we 
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Start 


| Initial Population P_ 


Fe) Pe(P) 


Best-Individual 


F(R )=PICe) 


t 
‘Selection, Crossover, Mutation 


NO 
(t=t+1) 


Offspring Population P’ 


YES 
Output 


Figure 1: Scheme of EPT algorithm. 


Algorithm 1 Evolutionary-Based PT (EPT) 


1: Define Stopping Criteria, (¢ < tax); population Size 
Nmax for each iteration; crossover probability P., and 
mutation probability P,,. 

2: Generate an initial random population P; of consistent 
probabilities P? with m/(.). 

: For each individual P/ in P; do 
Calculate Fitness F(P?) = PIC(P;?) of P? 

Store the best individual P?** 

End 

: Repeat: 

Selection: Select 2 individuals based on fitness 
Crossover: exchange parts of 2 individuals with proba- 

bility P. 

10: Mutation: mutate the child individuals with probability 
i m 

11: After these three sub-steps, the updated population P}, 
is obtained 

12: Calculate the fitness of individuals of P,, and store the 
best individual Pj?’s* 

13; If F(P?) > F(P™) 

14: Best-Individual remains unchanged 

15: else 

16: Best-Individual = P//* 

17: If t > tmax then stops, otherwise t + 1 — t and go back 
to line 7 


SOP SO SON Sh a: 12 


can obtain, and the capability of EPT to improve target type 
tracking. 


A. Examples and comparisons 


In order to compare different PTs with EPT, two cases 
borrowed from [11] and [12] are considered, where PIC is 
used for evaluation. In all the following cases, the parameters 
tmax» Nmax» P. and P,, necessary to EPT method have been 
set tO tmax = 50, Mmax = 1000, P. = 0.9 and P,, = 0.1 
respectively. 


Example 1: 0 = {6}, 62, 03, 04} 


The BBA m(.) to approximate by a Bayesian BBA (prob- 
ability measure) is 
m(61) = 0.16, m(2) = 0.14, m(@3) = 0.01, m(A4) = 0.02 
m(61 U 62) = 0.20, m(6; U 63) = 0.09, m(A1 U 04) = 0.04 
m(02 U 03) = 0.04, m(02 U 04) = 0.02, m(03 U 04) = 0.01 
m(61 U 62 U 63) = 0.10, m(61 U O2 U 04) = 0.03 
m(61 U 63 U 64) = 0.03, m(2 U 63 U 04) = 0.03 

(O) = 0.08 


m 


The Bayesian BBA obtained by classical PT (5)—-(10) and 
EPT with their corresponding PIC values calculated by (11) 
are given in Table I. As expected, the EPT provides the 
maximum PIC . 


Table I: Results of Different PTs in Example 1. 


an 0 03 04 PIC 

CuzzP 0.3860 | 0.3382 | 0.1607 | 0.1151 0.0790 
BetP 0.3983 | 0.3433 | 0.1533 | 0.1051 0.0926 
DSmPo 0.5176 | 0.4051 | 0.0303 | 0.0470 0.3100 
DSmPo.001 | 0.5162 | 0.4043 | 0.0319 | 0.0476 0.3058 
PrBP1i 0.5419 | 0.3998 | 0.0243 | 0.0340 0.3480 
PrBP2 0.5578 | 0.3842 | 0.0226 | 0.0354 0.3529 
EPT 0.7246 | 0.2218 | 0.0266 | 0.0270 0.4508 


Example 2: 0 = {6}, 62, 03, 04} 


In this case, we randomly generate BBAs and compare EPT 
with classical PTs (CuzzP, BetP, DSmP, PrBP1 and PrBP2 
given by (5)-(10)). The original BBAs to approximate are 
generated according to Algorithm 2 of [15]. 


Algorithm 2 Random generation of BBA 


Input: Frame of Discernment 0 = {6}, 02, 03, 64} 
Nmazx :Maximum number of focal element 
Output : BBA-m 

Generate K(Q), which is the power set of O 
Generate a random permutation of K (©) > R(O) 
Generate an integer between | and Niyraz > | 

For each First k elements of R(©) do 

Generate a value within [0,1] — m;,i=1,--- ,1 
End 

Normalize the vector m = [m1,mo2,--- ,mi] > m 
/ 


: m(6;) = mi, 


NOOO tO, Ng ee RS 


/ 


ee 
- oO 


In our test, we have set the cardinality of the FoD to 4 
and fixed the number of focal elements to 1 = Nmaz = 15. 
We randomly generate L = 30 BBA’s. Six PT methods 
are tested and PIC is used to evaluate the quality of their 
corresponding results are shown in Fig.2. As we can see, EPT 
outperforms significantly other methods based on maximum 
of PIC criterion, which is normal. 


B. Evaluation of EPT in Target Type Tracking problem 


Target Type Tracking (TTT) problem can be briefly de- 
scribed below [16]: 
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PIC 
° 
a 


Figure 2: PIC values obtained with EPTs and classical PTs. 


1) Target Type Tracking Problem (TTT): 

1. Considering ¢ = 1,2,--- ,Gmaz be the time index 
and let N possible target types Tare € O = 
{61,02,--- ,9n} in the surveillance area; For instance, 
in the normal air target surveillance systems the FoD 
could be O = {Fighter,Cargo}. That is, Tar; = 
6, = Fighter, Targ = 02 & Cargo. Similarly, the 
FoD in a ground target surveillance systems could be 
Oground = {Tank, Truck, Car, Bus}. In this paper, 
we just consider the air target surveillance systems to 
prove the practicability of EPT. 

2. At every time ¢, the true type of the target Tar (¢) € O 
is immediately observed by an attribute-sensor (here, we 
assume a possible target probability). 

3. A defined classifier is applied to process the attribute 
measurement of the sensor which provides the probabil- 
ity Tara (¢) on the type of the observed target at each 
instant ¢. 

4. The sensor is in general not totally reliable and is 
characterized by a N x N confusion matrix: 


M = [M,,; = P (Tara = Tar;|TrueType = Tar;)| 
(15) 
where OO <i< N;O<j<N. 
Here, we briefly summarize the main steps of TTT using 
EPT. 


1. Initialization. Determine the target type frame 
O = {61,02,---,On} and set the initial BBA 
mintial (6, 02U+--UOn) = 1 since there is no 
information about the first target type that will be 
observed; 

2. Updating BBA. An observed BBA m,os(.) on types of 
unknown observed target is defined from current target 
type declaration and confusion matrix M; 

3. Combination. We combine the current BBA mop. (-) 
with initial BBA m‘"**’*!(.) according to PCR6 com- 
bination rule: mpcre(-) = Moss(-) Bmiml(.) ; 


4. Approximation. Using EPT(-) to 
Mpcre(-) into a Bayesian BBA; 

5. Decision Making. Taking a final decision about the type 
of the target at current observation time based on the 
obtained Bayesian BBA; 

6. Updating BBA. Set m'”’@!(.) = mpcopre(-), and in- 
crease time index ¢ = ¢ + 1 and go back to step 2. 

2) Raw Dataset of TIT: We have tested our EPT-based 
TTT on a very simple scenario for a 2D TTT, namely O = 
{Fighter, Cargo} for two types of classifiers. The matrix M, 
corresponds to the confusion matrix of the good classifier, and 
Mg corresponds to the confusion matrix of the poor classifier. 


0.95 0.05 075 0125 
Mi = ie tak pe nce onl 


In our scenario, a true Target Type sequence over 120 scans 
is generated according to Fig. 3. We can observe clearly from 
Fig. 3 that Cargo (which is denoted as Type 2) appears at 
first in the sequence, and then the observation of the Target 
Type switches three times onto Fighter Type (Type 1) during 
different time duration (namely, 20s, 10s, 5s). 


approximate 


(16) 
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Figure 3: Raw Sequence of True Target Type. 


A pathological case for TTT: Our analysis has shown that 

EPT can nevertheless be in troubles for tracking two target 

types as proved in this simple particular example (when 0 < 

m(6; U 02) < 0.1). Let’s consider the following BBA 
Mearget(-) = [A1, 62, 0, U 62] = (0, 1, 0} 


According to the compatibility constraints (12)—-(14), the 
population P/, is obtained from P; through a selection pro- 
cedure. Next, individual P!! in P! which is denoted as 
Pt! = [m!'(0,),m’(@2)| is subject to initial constraint (1) and 
(17): 

m’(0,) > (Bel(@,) = m(@1) = 0) 
m’ (04) < (PI(61) = m(61) + m(O1 U 62) =0+0= 0); 
m' (02) = (Bel(#2) = m(42) = 1) 
m' (02) < (PI(62) = m(62) + m(O2 U 61) =1+0= 1); 
(17) 
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From the above inequalities, one sees that only one probability 
measure P,° = [m(61),m(02)] = [0, 1] (where the superscript 
index S means Single) satisfies this constraint. However 
because of mechanism of EPT and real-coded generic 
algorithm (RCGA), the probabilities P? in population P; 
which are randomly generated in the interval [0,1], will 
have a very little chance to be equal to the suitable measure 
(0, 1] satisfying the constraints. That is why EPT becomes 
inefficient in this case which occurs with a probability of 
1/nmax, Where Nmax denotes the size of population* P;. 
Unfortunately, in TTT decision making problems, such case 
cannot be avoided because it can really happens. 


To circumvent this problem and make EPT approach 
working in all circumstances, we need to modify a bit the 
EPT method to generate enough individuals for making se- 
lection step efficiently when the bounds of belief interval 
[Bel, Pl] take their min and max values ((0.9, 0.05, 0.05], 
(0.05, 0.9, 0.05]). For achieving this, we propose to enlarge 
the interval through a parameter A, and maintain the property 
of original interval in some degree at the same time. More 
precisely, the modified belief interval, denoted [Bel’, Pl’, is 
heuristically computed by a simple thresholding technique as 
follows: 

First, we assume that the original BBA we consider here for 
FoD 0 = {61, 02} is (01, A2, A, U 02] = [a, b, cl, (a +b+ c) = 
1;0<c<0.1) 
Step 1: Let m’(0; U2) =c+A; 
Step 2: ifa>b 
m!(0,) =a— A;m' (82) = b;m' (0, Ub) =c+A; 
(18) 
Step 3: ifa<b 
m! (01) = a;m' (02) = b—A;m' (0, U 02) =c+A; 
(19) 
So the value of [Bel'(6,), Pl’(61)] and [Bel’ (62), Pl’(2)] can 
be calculated based on Eq.(18),Eq.(19), which are presented 
as follows: 
When a > b: 
Pl'(0,) = m(01) + m'(0, U2) =a-—At+c+A=a+4+¢; 

Bel'(6,) =1— Pl'(@,;) =1-—(b+e+A)=a-A. 

(20) 


Pl’ (82) = m(02) +m’ (0, U 02) =b+c+A=b+cH+%; 
Bel' (62) =l1- Pl’ (82) 


=1-(a-—A+c+A)=1-(a+ec)=b 
(21) 
When a < b: 
PU (0) = m(01) +m’ (0, U 02) =a+c+t+A; 
Bel’ (0) =1— Pl'(@;) (22) 


=1-(b-A+ct+A)=1 


“In our simulation, we did take mmax = 1000. 


Pl' (62) = m(62) + m’ (0; U 2) =6 
Bel! (62) =1— Pl'(62) =1-—(a+e4 


(23) 


Explanation: Through step 1, one computes the total 
singleton mass one has in the entire BBA and the threshold 
value 0.9 allows to evaluate if the percentage of singleton mass 
is big enough or not. Here, we not only consider the unique 
extreme case Mtarget(-) = [41, 92, 1 U 92] = [0, 1, 0], but also 
other possible cases such as mMtarget(-) = [1, 02, 01 U 42] = 
(0.0001, 0.9998, 0.0001]. Why do we consider the concept 
of percentage? Actually, the higher percentage of singleton 
mass, the smaller interval for P}, in other words, the higher 
value of m(6, U6), the bigger interval for P? which can 
be shown in Eq.(17); The step 2 and step 3 give the way 
of calculating the updated upper bound of belief interval 
[Bel’, Pl’| and Eq.(20)-Eq.(23) prove that the parameter 
A determines the range of the interval; Next, we give two 
examples to show how the above method works: 


The pathological case 1 for TTT (using modified EPT) 
Mearget(-) = [1, 42, 01 U 2] = (0.0001, 0.9998, 0.0001] . 


Here, the parameter is arbitrarily? set to 0.4. Then 
one computes in step 2 the modified plausibility bounds 
Bel'(@,) = 0.0001, Pl’(@,) = 0.0001 + 0.0001 + A = 
0.4002 and Bel’(@2) = 0.9998 — 0.4 = 0.5998, Pl’(@2) = 
0.9999. So we get [Bel’(6,), Pl’(61)| = [0.0001, 0.4002] and 
[Bel! (02), Pl’ (@2)| = [0.5998, 0.9999]. 

Consequently, any Bayesian BBA P? = [m’(0,),m’(02)| 
must be generated according the (modified) compatibility 
constraints 


m' (01) € [Bel'(0,), Pl’ (@1)] = [0.0001, 0.4002] 
m’ (02) € [Bel! (2), Pl’ (@2)] = (0.5998, 0.9999] 
The pathological case 2 for TTT (using modified EPT) 
Mearget(-) = [01, 42, 61 U O02] = [0.45, 0.48, 0.07] . 
Here, the parameter \ is set to 0.2. Then any Bayesian 
BBA P? = [m’(61),™m/‘(@2)| must be generated according the 
(modified) compatibility constraints 
m’ (01) € [Bel' (61), Pl’(@1)] = [0.45, 0.72] 
m’ (0) € [Bel' (2), Pl’ (@2)] = [0.28, 0.55] 
In order to evaluate the influence of the parameter \, 


we have reexamined all the pathological cases based on the 
following procedure: 
1) The value of parameter \ is taken to five possible values: 
0, 0.1, 0.2, 0.3, 0.4, 0.5; 
2) We randomly generate initial population P, based on A, 
which is also subjected to the constraints (12)-(14). 


>The value of the parameter \ can be chosen to any value in [0, 1] by the 
designer for his/her own reason to ensure the alternative interval effectively 
in modified EPT version. 
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With this simulation, we can observe in Fig.4 the impact of 
value on the number of P/ in P;. When = 0 happens . 
there exists no suitable P? for case one which demonstrates 
the necessity to circumvent the pathological case problem. 
Obviously, the number of P? increases with the increase of 
A value, which efficiently proves the advantage of using the 
modified EPT approach to make selection step of the evolu- 
tionary algorithm more efficient. One point we need to clarify 
is that the intervals i.e. [Bel’(0,), Pl’(01)], [Bel’ (02), Pl’ (62)| 
induced from parameter \ above aims at guaranteeing enough 
number of P? in P; in the implementation of EPT. 


The relationship between paramater 1 and the number of Individual 
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Figure 4: Impact of (x-axis) on individuals in P; (y-axis). 


3) Simulation Results of TTT Based on Modified EPT: Our 
simulation consists in 100 Monte-Carlo runs and we show in 
the sequel the averaged performances of EPT and DSmP. The 
figures 5-8 illustrate the Bayesian BBA’s obtained by DSmP 
[7] -(part a) and our new EPT method-(part b) based on TTT 
using PCR6 fusion rule. One sees that regardless of the good 
classifier Mj, and poor classifier M2, EPT is able to track 
properly the quick changes of target type. 


V. LIMITATION OF EPT 


As pointed out by Han et al. in [14], in general it is 
not enough, nor comprehensive to evaluate the quality of 
probabilistic transformation of a BBA from only the PIC 
criterion, even if the chosen PT provides highest PIC value 
by optimization. Our EPT approach, is not exempt of this 
problem of course as we can see in the simple example below, 
where no optimization technique provides useful (robust) 
solution. 


Let’s consider the FoD 0 = {61,02} with the BBA to 
approximate chosen as follows: 


m(6,) = 0.10001, m(62) = 0.10000, m(0, U 62) = 0.79999 


Based on PIC value optimization using EPT (or any other 
efficient optimization techniques), we will obtain the Bayesian 


Swhich actually the original EPT is applied 
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Estimation of belief assignment for Cargo Type (a) 
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Estimation of belief assignment for Cargo Type (b) 
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Figure 5: Belief Mass for Cargo Type for M,. 
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Estimation of belief assignment for Cargo Type-poor (b) 
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Figure 6: Belief Mass for Cargo Type for Mo. 


Estimation of belief assignment for Fighter Type (a) 
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Figure 7: Belief Mass for Fighter Type for M,. 
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Estimation of belief assignment for Fighter Type-poor (a) 
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Estimation of belief assignment for Fighter Type-poor (b) 
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Figure 8: Belief Mass for Fighter Type for Mo. 


BBA m(6;,02) = [0.0001605, 0.9998394] with PIC = 
0.9977. This simple example shows that in the original BBA 
m(6,) is almost the same as m(@2) and there is no solid 
reason to get a very high probability for #2 and a small 
one for 4; in the Bayesian BBA, even if a highest PIC 
is reached. Exaggerated high PIC is not always preferred 
(unreasonable or directly make wrong decisions), which can 
be seen in Fig.6 and Fig.8, although the PIC should be as 
high as possible for decision making problems. Therefore, a 
reasonable compromise must be found between PIC level and 
also fidelity level of the transformations to the original BBA, 
which is a theoretical open challenging problem left for further 
research works. 


VI. CONCLUSION 


An evolutionary algorithm for probabilistic transformation 
(EPT) has been proposed in this paper. It uses the genetic 
algorithm to obtain Bayesian belief function with highest 
PIC value. The utility of EPT was verified on a set of 
three probabilistic transformation cases borrowed from the 
literature. On these cases, the performance of EPT has been 
compared to other existing probabilistic transformations. Our 
results indicate that EPT performs better than others on all 
problems from PIC increasing standpoint. The shortcomings of 
original EPT version have been clearly identified on two type 
tracking problems, and they have been overcome thanks to a 
modification of belief interval constraints. As future works, we 
would like to establish more appropriate evaluation criteria and 
make more comparisons between performances of this EPT 
approach with other recent proposed evolutionary algorithms. 
We would also make more investigations on EPT to extend it 
to work with more than two targets. 
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Abstract—In many applications involving epistemic uncertain - 
ties usually modeled by belief functions , it is often necessary to 
approximate general (non-Bayesian ) basic belief assignments ( 
BBAs ) to subjective probabilities (called Bayesian BBAs ). This 
necessity occurs if one needs to embed the fusion result in a sys- tem 
based on the probabilistic framework and Bayesian inference (e.g. 
tracking systems), or if one wants to use classical decision theory to 
make a decision .There exists already several methods ( 
probabilistic transforms ) to approximate any general BBA toa 
Bayesian BBA. From a fusion standpoint ,two approaches are 
usually adopted : 1) one can approximate at first each BBA in 
subjective probabilities and use Bayes fusion rule to get the final 
Bayesian BBA, or 2) one can fuse all the BBAs with a fusion rule, 
typically Dempster-Shafer’s, or PCR6 rules (which is very costly in 
computations ), and convert the combined BBA in a subjective 
probability measure . The former method is the simplest method 
but it generates a high loss of information included in original 
BBAs , whereas the latter is intractable for high dimension 
problems . This paper presents a new method to achieve this task 
based on hierarchical decomposition (coarsening ) of the frame of 
discernment , which can be seen as an intermediary approach 
between the two aforementioned methods. After the presentation 
of this new method, we show through simulations how its performs 
with respect to other methods. 

Keywords: Information fusion, belief functions, DST, DSmT, 


PCR6 rule, coarsening. 


I. INTRODUCTION 


The theory of belief functions, known as Dempster-Shafer 
Theory (DST) has been developed by Shafer [1] in 1976 
from Dempster’s works [2]. Belief functions allow to model 
epistemic uncertainty and they have been already used in many 
applications since the 1990’s [3], mainly those related to expert 
systems, decision-making support and information fusion. To 
palliate some limitations of DST, Dezert and Smarandache 
have proposed an extended mathematical framework of belief 
functions with new efficient quantitative and qualitative rules 
of combinations, which is called DSmT (Dezert and Smaran- 
dache Theory) in the literature [4], [5] with applications listed 
in [6]. One of the major drawbacks of DST and DSmT is their 
high computational complexities, as soon as the fusion space 
(i.e. frame of discernment - FoD) and the number of sources 
to combine are large!. 


'DSmT is more complex than DST, and the Proportional Conflict Redistri- 
bution rule #6 (PCR6 rule) becomes computationally intractable in the worst 
case as soon as the cardinality of the Frame of Discernment (FoD) is greater 
than six. 
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To reduce the computational cost of operations with belief 
functions when the number of focal elements is very large, 
several approaches have been proposed by different authors. 
Basically, the existing approaches rely either on efficient 
implementations of computations as proposed for instance in 
[7], [8], or on approximation techniques of original Basic 
Belief Assignment (BBA) to combine [9]-[12], or both. In 
many applications involving epistemic uncertainties usually 
modeled by belief functions, it is often necessary to approxi- 
mate general (non-Bayesian) basic belief assignments (BBAs) 
to subjective probabilities (called Bayesian BBAs). This neces- 
sity occurs if one needs to embed the fusion result in a system 
based on the probabilistic framework and Bayesian inference 
(e.g. tracking systems), or if one wants to use classical decision 
theory to make a decision. From a fusion standpoint, two 
approaches are usually adopted: 1) one can approximate at 
first each BBA in subjective probabilities and use Bayes fusion 
tule to get the final Bayesian BBA, or 2) one can fuse all 
the BBAs with a fusion rule, typically Dempster-Shafer’s, or 
PCR6 rules (which is very costly in computations), and convert 
the combined BBA in a subjective probability measure. The 
former method is the simplest method but it generates a high 
loss of information included in original BBAs, whereas the 
latter direct method is intractable for high dimension problems. 
This paper presents a new method to achieve this task based 
on hierarchical decomposition (coarsening) of the frame of 
discernment, which can be seen as an intermediary approach 
between the two aforementioned methods. 

This paper presents a new approach to fuse BBAs into a 
Bayesian BBA in order to reduce computational burden and 
keep the fusion tractable even for large dimension problems. 
This method is based on a hierarchical decomposition (coars- 
ening) framework which allows to keep as much as possible 
information of original BBAs in preserving lower complexity. 
The main contributions of this paper are: 


1) the presentation of the FoD bintree decomposition on 
which will be done the BBAs approximations; 

2) the presentation of the fusion of approximate BBAs from 
bintree representation. 


This hierarchical structure allows to encompass bintree decom- 
position and BBAs approximations on it to obtain the final 
approximate fusionned Bayesian BBA. 
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This paper is organized as follows. In section II, we recall 
some basics of DST and DSmT that are relevant to the new 
method presented in this paper. More details with examples 
can easily be found in [1], [5]. We will also briefly recall 
our preliminary works about hierarchical coarsening of FoD. 
Section II presents the novel hierarchical flexible (adaptive) 
coarsening method which can be regarded as the extension of 
our previous works. Two simple examples are given in section 
IV to illustrate the detailed calculation steps. Simulation 
experiments are presented in section V to show the rationality 
of this new approach. Finally, Sect.VI concludes the paper 
with future works perspectives. 


Il. MATHEMATICAL BACKGROUND 


This section provides a brief reminder of basics of DST and 
DSmnT, and of original hierarchical coarsening method which 
are necessary for the presentation and the understanding of 
the more general flexible coarsening approximate method of 
section III. 


A. Basics of DST and DSmT 


In DST framework, the frame of discernment? © 
{61,...,On} (n > 2) is a set of exhaustive and exclusive 
elements (hypotheses) which represent the possible solutions 
of the problem under consideration and thus Shafer’s model 
assumes 6,96; = Q fori # j in {1,...,n}. A basic 
belief assignment (BBA) m(-) is defined by the mapping: 
2° ++ (0, 1], verifying m(0) = 0 and > 4296 m(A) = 1. In 
DSmT, one can abandon Shafer’s model (if Shafer’s model 
doesn’t fit with the problem) and refute the principle of 
the third excluded middle*. Instead of defining the BBAs 
on the power set 2° £ (0,U) of the FoD, the BBAs 
are defined on the so-called hyper-power set (or Dedekind’s 
lattice) denoted D° = (©,U,M) whose cardinalities follows 
Dedekind’s numbers sequence, see [5], Vol.1 for details and 
examples. A (generalized) BBA, called a mass function, m/(-) 
is defined by the mapping: D®© +> (0, 1], verifying m(0) = 0 
and >? 4<pe m(A) = 1. DSmT framework encompasses DST 
framework because 2° C D®. In DSmT we can take into ac- 
count also a set of integrity constraints on the FoD (if known), 
by specifying all the pairs of elements which are really 
disjoint. Stated otherwise, Shafer’s model is a specific DSm 
model where all elements are known to be disjoint. A € D® is 
called a focal element of m(.) if m(A) > 0. A BBA is called 
a Bayesian BBA if all of its focal elements are singletons 
and Shafer’s model is assumed, otherwise it is called non- 
Bayesian [1]. A full ignorance source is represented by the 
vacuous BBA m,(@) = 1. The belief (or credibility) and 
plausibility functions are respectively defined by Bel(X) + 
Lyepejycx MY) and PI(X) - Vreveyynxzo MY). 
BI(X) © [Bel(X), Pl(X)] is called the belief interval of 
X. Its length U(X) = Pl(X) — Bel(X) measures the degree 
of uncertainty of X. 


A 


2We use the symbol * to mean equals by definition. 
3The third excluded middle principle assumes the existence of the comple- 
ment for any elements/propositions belonging to the power set 2°. 


In 1976, Shafer did propose Dempster’s rule*+ to combine 
BBAs in DST framework. DS rule is defined by mps(@) = 0 
and VA € 2° \ {O}, 


22B,Ce2°|BNC=A my (B)m2(C) 
ise 27 B,Cce2°|BnC=0 my (B)ma(C) 


DS rule formula is commutative and associative and can be 
easily extended to the fusion of S > 2 BBAs. Unfortunately, 
DS rule has been highly disputed during the last decades 
by many authors because of its counter-intuitive behavior in 
high or even low conflict situations, and that is why many 
rules of combination have been proposed in the literature to 
combine BBAs [13]. To palliate DS rule drawbacks, the very 
interesting PCR6 (Proportional Conflict redistribution rule #6) 
has been proposed in DSmT and it is usually adopted> in 
recent applications of DSmT. The fusion of two BBAs mj (.) 
and m2(.) by the PCR6 rule is obtained by mpcr6(9) = 0 
and VA € D® \ {0} 


mps(A) = (1) 


mpcr6(A) = mi2(A)+ 


[ m,(A)?m2(B) mM2(A)?m1(B) ] 
Bepe\iaanpeg UA) + a(S) mal A} + _, 
where mi2(A) = di pcepe|sncaa™i(B)m2(C) is the 


conjunctive operator, and each element A and B are expressed 
in their disjunctive normal form. If the denominator involved 
in the fraction is zero, then this fraction is discarded. The 
general PCR6 formula for combining more than two BBAs 
altogether is given in [5], Vol. 3. We adopt the generic notation 
mie R6(.) = PCR6(mi(.),mo2(.)) to denote the fusion of 
my,(.) and mo(.) by PCR6 rule. PCR6 is not associative 
and PCR6 rule can also be applied in DST framework (with 
Shafer’s model of FoD) by replacing D©° by 2° in Eq. (2). 


B. Hierarchical coarsening for fusion of Bayesian BBAs 


Here, we briefly recall the principle of hierarchical coarsen- 
ing of FoD to reduce the computational complexity of PCR6 
combination of original Bayesian BBAs. The fusion of original 
non-Bayesian BBAs will be presented in the next section. 

This principle was called rigid grouping in our previous 
works [17]-[19]. The goal of this coarsening is to replace 
the original (refined) Frame of Discernment (FoD) © by a 
set of coarsened ones to make the computation of PCR6 rule 
tractable. Because we consider here only Bayesian BBA to 
combine, their focal elements are only singletons of the FoD 
© £ {61,...,0n}, with n > 2, and we assume Shafer’s model 
of the FoD O. 

A coarsening of the FoD © means to replace it with another 
FoD less specific of smaller dimension Q = {w ,...,w} with 
k < n from the elements of ©. This can be done in many 
ways depending the problem under consideration. Generally, 
the elements of © are singletons of ©, and disjunctions of 


4We use DS index to refer to Dempster-Shafer’s rule (DS rule) because 
Shafer did really promote Dempster’s rule in in his milestone book [1]. 
5PCR6 rule coincides with PCRS when combining only two BBAs [5]. 


230 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


elements of ©. For example, if O = {61, 02,03, 64}, then the 
possible coarsened frames built from © could be, for instance, 
vs {wy = 01, We = A2, W3 = A3 U 64}, or Q = {wi = 
0, UO2, w2 = 03U04}, etc. When dealing with Bayesian BBAs, 
the projection? m(.) of the original BBA m®(.) is simply 
obtained by taking 
m® (wi) = 


S> m°(6;) (3) 
0; Cw; 
The hierarchical coarsening process (or rigid grouping) is 
a simple dichotomous approach of coarsening obtained as 
follows: 
e If m =|O| is an even number: 
The disjunction of the n/2 first elements 6; to 02 of © 
define the element w; of ©, and the last n/2 elements 
On 44 to 6, of © define the element we of Q, that is 


OF {a =O, U...UO2, we = On41U...U bn} 


and based on (3), one has 


m@(u1)= > m°(6;) (4) 
55 ee a 

m?(wo)= S>  m°(6;) (5) 
j=$§t1,....n 


For example, if O = {01,62,63,04}, and one considers 
the Bayesian BBA m°(0,) = 0.1, m°(62) = 0.2, 
m® (63) = 0.3 and m°(64) = 0.4, then Q = {w, = 
61 U 02, We = 03 U O4} and m®(w1) = 0.14 0.2 = 0.3 
and m?(w2) = 0.3 + 0.4 = 0.7. 
e If m =|O| is an odd number: 

In this case, the element w, of the coarsened frame (2? is 
the disjunction of the [n/2+1]’ first elements of ©, and 
the element we is the disjunction of other elements of O. 
That is 


Q4 {wy =60, U...U 0241), W2 = On 441 U...U On} 
and based on (3), one has 


m® (0;) (6) 


m® (6;) (7) 


» 


J=[$+1]4+1,....n 


For example, if O = {61, 02,03, 04,45}, and one consid- 
ers the Bayesian BBA m°(6,) = 0.1, m°(62) = 0.2, 
m® (03) = 0.3, m°(64) = 0.3 and m°(4;) = 0.1, then 
Q= {wi = A; U Ag U 43, we = 04 U 65} and mw) => 
0.14+0.2+ 0.3 = 0.6 and m@(we) = 0.3 + 0.1 = 0.4. 


Of course, the same coarsening applies to all original BBAs 
m9(.), s=1,...9 of the S > 1 sources of evidence to work 
with less specific BBAs m®(.), s = 1,...S. The less specific 


6For clarity and convenience, we put explicitly as upper index the FoD for 
which the belief mass refers. 
7The notation [a] means the integer part of «. 


BBAs (called coarsened BBAs by abuse of language) can then 
be combined with PCR6 rule of combination according to 
formula (2). This dichotomous coarsening method is repeated 
iteratively | times as schematically represented by a bintree’®. 
The last step of this hierarchical process is to calculate the 
combined (Bayesian) BBA of all focal elements according 
to the connection weights of the bintree structure, where the 
number of iterations (or layers) | of the tree depends on 
the cardinality |O| of the original FoD ©. Specifically, the 
assignment of each focal element is updated according to the 
connection weights of link paths from root to terminal nodes. 
This principle is illustrated in details in the following example. 


Example 1: Let’s consider 0 = {0}, 02,63, 64,95}, and the 
following three Bayesian BBAs 


Focal elem. | m?(.) | m2(.) | m}() 
0, 0.1 0.4 0 
7D) 0.2 0 0.1 
03 0.3 0.1 0.5 
04 0.3 0.1 0.4 
Os 0.1 0.4 0 


The hierarchical coarsening and fusion of BBAs is obtained 
from the following steps: 

Step 1: We define the bintree structure based on iterative 
half split of FoD as shown in Fig. 1. 


6, 0, 0, 0, 


Cp. c 
L 4, 9g On 9, O12 


Figure 1: Fusion of Bayesian BBAs using bintree coarsening 
for Example 1. 


The connecting weights are denoted as \j,.. 

elements of the frames (Q are defined as follows: 
e At layer | = 1? Oy: = {wy 4 6, U 02 U 03, wo 4 64,U 05} 
e At layer / = 2: 


“5 As. The 


Q2 = {wit = 01 U A2, W12 = 03, Wo1 = 64, W22 = 65} 


e At layer | 2: Qs {wr11 é 01, W112 -_ 02} 


8Here we consider bintree only for simplicity, which means that the 
coarsened frame 2 consists of two elements only. Of course a similar method 
can be used with tri-tree, quad-tree, etc. 
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Step 2: The BBAs of elements of the (sub-)frames Q; are 
obtained as follows: 
e At layer / = 1, we use (6)-(7) because |O| = 5 is an odd 
number. Therefore, we get 


Focal elem. | m2 (.) | ms (.) | ms" (.) 
Wy & 0, U A U A3 0.6 0.5 0.6 
ws = 64U 6s 0.4 0.5 0.4 


e At layer / = 2: We work with the two subframes 021 = 
{wii, wie} and Qo 2 {w1, we2} of Q2 with the BBAs: 


Focal elem. | m2 (.) | m2 (.) | m$*(.) 
wi = a U A, i | 7 
W12 = 63 4 z 8 
Focal elem. | mi??2(.) | mf22(.) | m$??(.) 
wo = O4 3 z | 1 
wor = Os ‘i 3 0 


These mass values are obtained by the proportional 
redistribution of the mass of each focal element with 
respect to the mass of its parent focal element in the bin 
tree. For example, the value m§2* (11) = 4/5 is derived 
by taking 


m (01) + m3 (92) 
ms (81) + m3 (82) + m§ (83) 


04 4 
05 5 


m3” (wi1) = 


Other mass values are computed similarly using this 
proportional redistribution method. 

e At layer / = 3: We use again the proportional redistribu- 
tion method which gives us 


Focal elem. | m°?s(.) | mgs (.) | m3 (.) 
wii = Oy $ 1 (0) 
wiz = 02 2 0 1 


Step 3: The connection weights A; are computed 
from the assignments of coarsening elements. In each 
layer 1, we fuse sequentially? the three BBAs_ us- 
ing PCR6 formula (2). More precisely, we compute at 


first ig PCR6(m‘"(.),m$"(.)) and then 
PCR6,%) PCR6QL Qy. 
Mang (.) = PCR6(mi. ~~" (.),m3'(.)). Hence, we 


obtain the following connecting weights in the bintree: 


e At layer / = 1: 
Ar = ming? (w1) = 0.6297 
Ao = M(ig3 | (wa) = 0.3703 


e At layer / = 2: 


Ag = M37 (wi1) = 0.4137 
Ag = Mg (wi2) = 0.5863 
As = Mins 7? (war) = 0.8121 
Ag = Mii9)3 7? (we2) = 0.1879 


°Because PCR6 fusion is not associative, we should apply the general 
PCR6 formula to get best results. Here we use sequential fusion to reduce the 
computational complexity even if the fusion result is approximate. 


e At layer / = 3: 


Ay = mins? (wii1) = 0.3103 


Ag = M53 (wii2) = 0.6897 


Step 4: The final assignment of belief mass to the elements 
of original FoD © are calculated using the product of the 
connection weights of link paths from root (top) node to 
terminal nodes (leaves). We finally get the following resulting 
combined and normalized Bayesian BBA 


m® (01) = A1- Ag: A7 = 0.6297 - 0.4137 - 0.3103 = 0.0808 
m® (02) = 1 - Ag: Ag = 0.6297 - 0.4137 - 0.6897 = 0.1797 
m® (03) = A1- Ag = 0.6297 - 0.5863 = 0.3692 
m® (04) = Az - As = 0.3703 - 0.8121 = 0.3007 
m® (05) = Az - Ag = 0.3703 - 0.1879 = 0.0696 


UI. NEW HIERARCHICAL FLEXIBLE COARSENING METHOD 


Contrary to the (rigid) hierarchical coarsening method pre- 
sented in section II, in our new flexible coarsening approach 
the elements 6;, 7 = 1,...,n in FoD © will not be half 
split to build coarsening focal elements w;, j = 1,...,k of 
the FoD {);. In the hierarchical flexible (adaptive) coarsening 
method, the elements 0; chosen to belong to the same group 
are determined using the consensus information drawn from 
the BBAs provided by the sources. Specifically, the degrees 
of disagreement between the provided sources on decisions 
(01, 02,-:: ,9n) are first calculated using the belief-interval 
based distance dg; [16], [20] to obtain disagreement vector. 
Then, the k-means algorithm is applied for clustering elements 
0;,2 = 1,...,n based on the corresponding value in consensus 
vector. It is worth noting that values of disagreement reflect the 
preferences of independent sources of evidence for the same 
focal element. If they are small, it means that all sources have 
a consistent opinion and these elements should be clustered in 
the same group. Conversely, if disagreement values are large, 
it means that the sources have strong disagreement on these 
focal elements, and these focal elements need to be clustered 
in another group. 


A. Calculating the disagreement vector 


Let us consider several BBAs m9(-), (s = 1,..., 5) defined 
on same FoD © of cardinality |Q| = n. The specific BBAs 
mo,(.), i = 1,..., entirely focused on 0; are defined by 
mo, (0;) = 1, and for X 4 6; mg,(X) = 0. The disagreement 
of opinions of two sources about 6; is defined as the L- 
distance between the dg, distances of the BBAs m9(.), s = 
1,2 to mg, (.), which is expressed by 


Dy2(6;) = \dpr(m9(-),me,(-))) — dar(mZ(-),me,(-))| (8) 


The disagreement of opinions of S' > 3 sources about @;, is 
defined as 
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where dg, distance is defined by'® [20] 


2r—1 


di7(m1,m2) = 4|ne- S> [d"(Bh (6;), Blo(0:))?_ (10) 
i=1 

Here, ne = i/o"? is the normalization constant and 

d'({a, bj, {c,d]) is the Wasserstein’s distance defined by 

d! ((a, b], [c, d]) = [ — +[554 fe)? And 


BI(0;) = [Bel(6;), Pl(0;)]. 
The disagreement vector D,_¢ is defined by 


Di_s © [Di_s(01),..., Di_s(On)] 


B. Clustering focal elements 


(1) 


Once Dj;~_<¢ is derived, a clustering algorithm is used to 
coarsen focal elements according to their corresponding values 
in D,_¢. In this paper, we have used the k-means algorithm!! 
to cluster focal elements. For each source s = 1,...,S, the 
mass assignments of focal elements in two!” different clusters 
are added up according to formulas (12)-(13). 


mo (wi) = S> m°(6i) (12) 
O;,€w1 

m2 (we) = S> m°(4;) (13) 
0; Ewe 


C. Combination of the BBAs 


Based on the disagreement vector and k-means algorithm, a 
new adaptive bintree structure based on this flexible coarsening 
decomposition is obtained (see example in the next section) 
and the elements in FoD © are grouped more reasonably 
in each layer of the decomposition. Once the adaptive bin- 
tree structure is derived, other steps (multiplications of link 
weights) can be implemented which are identical to hierarchi- 
cal (rigid) coarsening method presented in section II to get the 
final combined Bayesian BBA. 


D. Summary of the method 


The fusion method of BBAs to get a combined Bayesian 
BBA based on hierarchical flexible decomposition of the FoD 
consists of the four steps below illustrated in Fig. 2. 

e Step 1 (pre-processing): At first, all input BBAs to 
combine are approximated to Bayesian BBAs with DSmP 
transform. 

e Step 2 (disagreement vector): D1_3s(-) is calculated us- 
ing dg, distances to estimate the degree of disagreement 

of BBAs m®, ..., m on potential decisions 61,..., An. 

e Step 3 (adaptive bintree): The adaptative bintree de- 
composition of the FoD © is obtained using k-Means 
algorithm to get elements of subframes (). 

e Step 4 (assignments and connection weights): For 
each source m°(-) to combine, the mass assignment of 


10For simplicity, we assume Shafer’s model so that |2°| = 2”, otherwise 
the number of elements in the summation of (10) should be |D®| — 1 with 
another normalization constant ne. 

'l which is implemented in Matlab™ 

'because we use here the bisection decomposition. 


each element of subframe (2; is computed by (12)-(13). 
The weight of links between two layers of the bintree 
decomposition are obtained with PCR6 rule!?. 

e Step 5 (fusion): The final result (combined Bayesian 
BBA) is computed by the product of weights of link paths 
from root to terminal nodes. 


Input BBAs 


im? (+),---sms (¢ 


Flexible grouping using 
K-Means 


Product of path 
link weights 
Final Combined 
Bayesian BBA 


Figure 2: Hierarchical flexible decomposition of FoD for 
fusion. 


IV. TWO SIMPLE EXAMPLES 
A. Example I (fusion of Bayesian BBAs) 


Let us revisit example | presented in section II-B. It can be 
verified in applying formula (9) that the disagreement vector 
D,_3 for this example is equal to 


D,_3 = [0.4085, 0.2156, 0.3753, 0.2507, 0.4086] 
The derivation of D,_3(61) is given below for convenience. 
D,~3(61) = |dpr(mP(-), mo, (81) — der(ms (-), mo, (41))| 
+ |der(m3(-),me, (01) — dar(ms (-), mo, (61))| 


+ |dpr(m? (-), me, (81)) — dar(mP(-), me, (91))| 
= 0.4085. 


Based on the disagreement vector and k-means algorithm, a 
new adaptive bintree structure is obtained and shown in Fig. 3. 
Compared to Fig. 1, the elements in FoD © are grouped more 
reasonably. In vector D,_3, 6; and 5 lie in similar degree of 
disagreement so that they are put in the same group. Similarly 
for 02 and 64. However, element 03 seems weird, which is 
put alone at the beginning of flexible coarsening. Once this 
adaptive bintree decomposition is obtained, other steps can 
be implemented which are identical to hierarchical coarsening 
method of section II to get the final combined BBA. 

The flexible coarsening and fusion of BBAs is obtained from 
the following steps: 


'39eneral formula preferred, or applied sequentially to reduce complexity. 
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0, 0, 0, 0, O, 


Cy. Cp C7. C7) 
wfa By 25 Sy 
® D7 Dy Dy. 


Figure 3: Example 1: Flexible bintree decomposition of FoD. 


Step 1: According to Fig.3, the elements of the frames 1); 
are defined as follows: 


e At layer | 1:04 {wy = A3, We = 0, U 02 U 04 U 65} 
e At layer l 2: Qe {wai = 0,U As, W22 — 65 U 04} 
e At layer i 3: 3 {wei A O1, W212 4 Os, W221 & 


02, W222 = 04} 
Step 2: The BBAs of elements of the (sub-)frames Q) are 
obtained as follows: 
e At layer / = 1, we use (12)-(13) and we get 


Focal elem. | m9 (.) | mS (.) | m$ (.) 
W2 — 0, U A U 04 U Os, 0.7 0.9 0.5 


e At layer 1 = 2: We use again the proportional redistribu- 
tion method which gives us: 
Focal elem. | m22(.) | m$?(.) | ms?(.) 
W21 & 0; U Os, 
W22 4 A U 04 
e At layer |] = 3: We work with the two subframes Q3; & 
{wai1, we12} and Q39 & {w21, W222 } of Q3 with the 


NEN Co} 
COlOWl A} 
o-oo 


BBAs 
Focal lem, m31(.) | mis(.) | mgs*(.) 
Wait = A, 2 2 2 
w12 = O5 5 5 3 
Focal elem. | m{?8?(.) | m$*2(.) | m$s?(.) 
W221 = 02 2 0 + 
W220 = O4 3 id 4 


Step 3: The connection weights \; are computed from the 
assignments of coarsening elements. Hence, we obtain the 
following connecting weights in the bintree: 


e At layer / = 1: 

Ay = 0.2226; Ag = 0.7774. 
e At layer | = 2: 

A3 = 0.2200; Aq = 0.7800. 
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e At layer / = 3: 
As = 0.5; Ag = 0.5; A7 = 0.0669; Ag = 0.9331. 


Step 4: We finally get the following resulting combined and 
normalized Bayesian BBA 


m®(-) = {0.0855, 0.0406, 0.2226, 0.5658, 0.0855}. 


B. Example 2 (with non-Bayesian BBAs) 


Example Ibis: Let’s consider 0 = {61, 02, 03, 64, 05}, and the 
following BBAs given by 


Focal elem. | m?(.) m(.)  m(.) 
0; 01 04 0 
05 0.2 0 0 
mn 03 005 0 
04 0.03 0.05 0 
0s 0.1 0.04 0 
61 U O> 0.1 0.04 0 
62U63U05;} 0 002. O11 
03 U O04 0.02 0.1 0.2 
6,U 8s 0.1 03 02 
6 0.05 0 0.5 


Step 1 (Pre-Processing): All these three BBAs are trans- 
formed into Bayesian BBAs with DSmP transform and the 
generated BBAs are illustrated as 


Focal elem. | m?(.) —m(.)__ m9.) 
0; 0.1908 0.7127 0.2000 
) 0.2804 0 0.1334 
Os 0.3387 O.1111 0.2333 
04 0.0339 0.1 0.2000 
Os 0.1562 0.0761 0.2333 


It can be verified in applying formula (9) that the disagree- 
ment vector D,_3 for this example is equal to 


D,_3 = (0.5385, 0.3632, 0.3453, 0.2305, 0.2827] 


Step 2: According to the clustering algorithm, the elements 
of the frames (; are defined as follows: 


e At layer | 1:04 {wy 2 01, Ww 2 02 U 03 U 04 U 65} 
e At layer I 2: Oe {wa = A2 U 03, Woo & 64U 5} 
e At layer 1 3: O03 {wei = 0o,wWe12 = 03,Wo21 = 


04, W222 = 05} 
Step 3: The BBAs of elements of the (sub-)frames 2; are 
obtained as follows: 


e At layer / = 1, we use (12)-(13) and we get 


Focal elem. | m?(.) | ms" (.) | m$" (.) 
wy = Oy 0.1908 | 0.7127 | 0.2000 
wo * 0.U63U 64,U 45 | 0.8092 | 0.2873 | 0.8000 


e At layer / = 2: We use again the proportional redistribu- 
tion method which gives us: 


Focal elem. | m2? (.) | ms? (.) | ms?(.) 
wo, = 02 U 63 | 0.7651 | 0.3867 | 0.4584 


woo =64U 05 | 0.2349 | 0.6133 | 0.5416 
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e At layer |] = 3: We work with the two subframes Q3; & 
{weai1, we12} and O39 4 {w21, W222} of 3 with the 
BBAs: 


Focal elem. m231(.) m3 ( ) m3! (.) 
W211 = 02 | 0.4529 0 0.3638 
woi2 263 | 0.5471 1 0.6362 

Focal elem. mis? (.) ms? () my? (.) 
w21 = 64 | 0.1783 | 0.5679 | 0.4616 
wo22 £65 | 0.8217 | 0.4321 | 0.5384 


Step 4: The connection weights \; are computed from the 
assignments of coarsening elements. Hence, we obtain the 
following connecting weights in the bintree: 


e At layer / = 1: 

Ay = 0.2345; Ag = 0.7655. 
e At layer / = 2: 

A3 = 0.5533; Aq = 0.4467. 
e At layer / = 3: 

As = 0.1606; Ag = 0.8394; 

A7 = 0.8349; Ag = 0.6651. 


Step 5: We finally get the following resulting combined and 
normalized Bayesian BBA 


m®(-) = {0.2345, 0.0681, 0.3555, 0.1145, 0.2274}. 


V. SIMULATION RESULTS AND PERFORMANCES 


A. Flexible Grouping of Singletons 

1) Similarity: i Assuming that 0 = {61, Ao, 43, 64, Os, 6, 
07, As, Ag, M10, A411, A192, 413, O14, O15} and first, we randomly 
generate 2 BBAs, denoted as m?(-) and m$(-), which can 
be seen in Table I. 


Table I: BBAs for Two Sources m?(-) and m9(-) 


01 O2 63 04 05 
my (-) 0.1331 0.0766 0.0175 0.0448 0.0229 
ms (-) 0.1020 0.0497 0.1094 0.0612 0.0612 
06 07 68 09 910 
my (-) 0.1142 0.0023 0.2254 0.1583 3.4959e-04 
ms (-) 0.0069 0.0070 0.0128 0.0833 0.0338 
O11 O12 013 O14 O15 
mr (-) 0.0075 0.0514 0.1121 0.0314 0.0021 
ms (-) 0.1180 0.1202 0.1351 0.0686 0.0309 


In order to fully verify the similarity between hierarchical 
flexible coarsening method and PCR6 in DSmT, a new strict 


'4 Similarity represents the approximate degree between fusion results using 
flexible coarsening and PCR6. 


Figure 4: Structure of Hierarchical Flexible Coarsening. 


distance metric between two BBAs, denoted ae p> Was recently 
proposed in [20], [16] and it will be used in this paper. 

In this paper, we regard d%, as one criteria for evaluating 
the degree of similarity between the fusion results obtained 
from flexible coarsening and PCR6. 

Based on (8) and (10), the disagreement vector D(-) is 
obtained: 


D(-) = (0.0032, 0.0020, 0.0290, 0.0092, 0.0147, 0.0228, 
0.0059, 0.0537, 0.0154, 0.0131, 0.0338, 0.0235, 
0.0118, 0.0145, 0.0120). 


Thus, bintree structure of hierarchical flexible coarsening is 
illustrated in Fig. 4 and the similarity between fusion results of 
hierarchical flexible coarsening and PCR6 is 0.9783. And the 
similarity between hierarchical coarsening method and PCR6 
is 0.9120. In particular, terminal nodes (the red small box 
in Fig. 4) of flexible grouping are not in accordance with the 
original order 01, 02,--- ,@15. This is quite different compared 
to original hierarchical coarsening method. 

From the point of view of statistics, 100 BBAs are randomly 
generated to be fused with three methods: hierarchical flexible 
coarsening, hierarchical coarsening and also PCR6. Compar- 
isons are made in Fig. 5, which show the superiority of our 
new approach proposed in this paper (Average value of new 
method is 97% and the old method is 93.5%). 


B. Flexible Grouping of Conflicting Focal Elements 


Assuming that there are five sources of evidence 
m® (-), m2(-),m$(-),mP(-), m2(-), and the restricted hype- 
power set D° = {61, 02,03, 04, 85,96, 07,08, 99, 010,01 
02,45 1069 07,01 N05 NA9M O10}. And then we randomly 
generate 1000 BBAs for each source to calculate the similarity 
using (10). From Fig. 6, we can find that hierarchical flexible 
coarsening method can also maintain high degree of similarity 
which performs better than hierarchical coarsening. 
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Similarity Comparison Between Hierarchical Flexible Coarsening (HFC) And Hierarchical Coarsening (HC). 
T . ; T T : T ; 
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Figure 5: Comparisons Between HFC and HC (Only Single- 
tons). 


Similarity Comparisions When Conflicts Exist 
T T T T 


0.98 


bag 
rs 


Similarity 
° 
g 


= 
@ 


—e—Hierarchical Coarsening 
—+-Hierarchical Flexible Coarsening 


0.86 L L L L L L n H n 
0 100 200 300 400 500 600 700 800 900 1000 


Figure 6: Comparisons Between HFC and HC (Singletons and 
Conflicting Focal Elements). 


C. Flexible Grouping of Uncertain and Hybrid Focal Elements 


We can also deal with uncertain and hybrid focal el- 
ements. Assuming that there are also five sources of 
evidence m(-),m$(-),m2(-), mP(-),mQ(-) and DE = 
{61, Ao, 43, 64, 5, A, 07, Ag, A, A10, (cal U A2, Os U 06 U 07, A; U 
Os U Ag U O10}3 De _ {61, A2, As, O4, Os, I, 07, As, Ag, M10, A2 (") 
04 U 6, 01 U 03 9 O5 U 87M O9}!>. And then we respectively 
and randomly generate 1000 BBAs for these two cases D® 
and D®. Finally, we calculate the average similarity degree of 
HFC and HC with PCR6 in Table II, which illustrates HFC 
performs better than old method. However, there exist the extra 
time cost of HFC compared to HC due to the clustering steps 
in coarsening process. 


Table I: Similarity Comparisons 


Hierarchical Flexible Coarsening Hierarchical Coarsening 


D®° 98% 91% 
D? 97% 93% 


VI. CONCLUSION AND PERSPECTIVES 
A novel hierarchical flexible approximate method in DSmT 
is proposed here. Compared to original hierarchical coarsen- 


'5Tn this case, D° represents uncertain focal elements and D? represents 
hybrid focal elements. 


ing, flexible strategy guarantees higher similarity with PCR6 
rules in fusion process. Besides, whether focal elements in 
hyper power set are singletons, conflicting focal elements, 
uncertain or even hybrid focal elements, the new method 
works well. In the future work, we will focus on the general 
framework of hierarchical coarsening, which could generate 
final non-Bayesian BBAs in order to avoid loss of informa- 
tion. Furthermore, other advantages or disadvantages of our 
proposed methods such as computational efficiency and time 
consumption need to be further investigated. 
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Abstract—In many applications involving epistemic uncertain- 
ties usually modeled by belief functions, it is often necessary 
to approximate general (non-Bayesian) basic belief assignments 
(BBAs) to subjective probabilities (called Bayesian BBAs). This 
necessity occurs if one needs to embed the fusion result in 
a system based on the probabilistic framework and Bayesian 
inference (e.g. tracking systems), or if one needs to make a 
decision in the decision making problems. In this paper, we 
present a new fast combination method, called modified rigid 
coarsening (MRC), to obtain the final Bayesian BBAs based on 
hierarchical decomposition (coarsening) of the frame of discern- 
ment. Regarding this method, focal elements with probabilities 
are coarsened efficiently to reduce computational complexity in 
the process of combination by using disagreement vector and a 
simple dichotomous approach. In order to prove the practicality 
of our approach, this new approach is applied to combine users’ 
soft preferences in recommender systems (RSs). Additionally, in 
order to make a comprehensive performance comparison, the 
proportional conflict redistribution rule #6 (PCR6) is regarded 
as a baseline in a range of experiments. According to the 
results of experiments, MRC is more effective in accuracy of 
recommendations compared to original Rigid Coarsening (RC) 
method and comparable in computational time. 


Keywords: Recommender system, DSmT, PCR6. 


I. INTRODUCTION 


The theory of belief functions, known as Dempster-Shafer 
Theory (DST) was developed by Shafer [1] in 1976 from 
Dempster’s works [2]. Belief functions allow one to model 
epistemic uncertainty [3] and they have been already used in 
many applications since the 1990’s [4], mainly those relevant 
to expert systems, decision-making support and information 
fusion. To palliate some limitations (such as high computa- 
tional complexity) of DST, Dezert and Smarandache proposed 
an extended mathematical framework of belief functions with 
new efficient quantitative and qualitative rules of combina- 
tions, which was called DSmT (Dezert and Smarandache 
Theory) in literature [5], [6] with applications listed in [7]. 
One of the major drawbacks of DST and DSmT is their high 
computational complexities, on condition that the fusion space 
(i.e. frame of discernment — FoD) and the number of sources 
to combine are large. DSmT is more complex than DST, and 
the Proportional Conflict Redistribution rule #6 (PCR6 rule) 
becomes computationally intractable in the worst case as soon 
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as the cardinality of the Frame of Discernment (FoD) is greater 
than six. 


To reduce the computational cost of operations with belief 
functions when the number of focal elements is very large, 
several approaches have been proposed by different authors. 
Basically, the existing approaches rely either on efficient im- 
plementations of computations as proposed for instance in [8], 
[9], or on approximation techniques of original Basic Belief 
Assignment (BBA) to combine [10]-[14], or both. From a 
fusion standpoint, two approaches are usually adopted: 1) one 
can approximate at first each BBA in subjective probabilities 
and use Bayes fusion rule to get the final Bayesian BBA [11], 
[12], or 2) one can fuse all the BBAs with a fusion rule, typi- 
cally Dempster-Shafer’s, or proportional conflict redistribution 
rule #6 (PCR6) rules (which is very costly in computations), 
and convert the combined BBA in a subjective probability 
measure [10], [14]. The former method is the simplest method 
but it generates a high loss of information included in the 
original BBAs, whereas the latter method is intractable for 
high dimension issues. 


This paper presents a new combination method, called 
modified rigid coarsening (MRC), to get the final Bayesian 
BBAs based on hierarchical decomposition (coarsening) of the 
frame of discernment, which can be seen as an intermediary 
approach between the two aforementioned methods. This hier- 
archical structure allows to encompass bintree decomposition 
and mass of coarsening FoD on it. To prove the practicality 
of our proposed method, MRC is applied to combine users’ 
preferences so as to provide the suitable recommendation for 
RSs. This paper is an extended version of our preliminary work 
on original rigid coarsening (RC) published in [15]. In this 
paper, more detailed analyses of this new combination method 
are provided. More importantly, this innovative method is also 
applied into the real application. These are all added values 
(contributions) of this paper. 


The main contributions of this paper are: 


1) the presentation of the FoD bintree decomposition on 
which will be done the BBAs approximations; 

2) user preferences in Recommender Systems (RSs) are 
modeled by DSmT-Modeling Function. 
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In order to measure the efficiency and effectiveness of the 
MRC, it is integrated in the RSs based on DSmT and compared 
to traditional methods in the experiments. The results show 
that regarding the accuracy of recommendations, MRC is 
extremely close to classical PCR6; and the computational time 
of MRC can be obviously superior to that of PCR6. 

The remainder of this paper is organized as follows. In 
section II, we review relevant prior work on DST and DSmT 
first. In section III, MRC is presented. In section IV, a 
recommendation system based on DSmT, that employs MRC 
to combine users’ preferences, is shown. In section V, we 
evaluate our proposed algorithm based on two public datasets: 
Movielens and Flixster. Finally, we conclude and discuss 
future work. 


II. MATHEMATICAL BACKGROUND 


This section provides a brief reminder of the basics of 
DST and DSmT, which is necessary for the presentation and 
understanding of the more general MRC of Section III. 

In DST framework, the frame of discernment! © a 
{61,..-,;9n} (n > 2) is a set of exhaustive and exclusive 
elements (hypotheses) which represents the possible solutions 
of the problem under consideration and thus Shafer’s model 
assumes 0; 6; = 0 for i 4 j in {1,...,n}. A basic belief 
assignment (BBA) m/(-) is defined by the mapping: 2° 1 
(0, 1], verifying m(@) = 0 and $7) 4¢96 m(A) = 1. In DSmT, 
one can abandon Shafer’s model (if Shafer’s model doesn’t 
fit with the problem) and refute the principle of the third 
excluded middle. The third excluded middle principle assumes 
the existence of the complement for any elements/propositions 
belonging to the power set 2°. Instead of defining the BBAs 
on the power set 2° 2 (©,U) of the FoD, the BBAs 
are defined on the so-called hyper-power set (or Dedekind’s 
lattice) denoted D® 2 (©,U,M) whose cardinalities follows 
Dedekind’s numbers sequence, see [6], Vol.1 for details and 
examples. A (generalized) BBA, called a mass function, m/(-) 
is defined by the mapping: D®© + [0,1], verifying m(0) = 0 
and }? 4<¢pe m(A) = 1. The DSmT framework encompasses 
DST framework because 2° c D®. In DSmT, we can take 
into account also a set of integrity constraints on the FoD 
(if known), by specifying all the pairs of elements which 
are really disjoint. Stated otherwise, Shafer’s model is a 
specific DSm model where all elements are deemed to be 
disjoint. A € D® is called a focal element of m/(-) if 
m(A) > 0. A BBA is called a Bayesian BBA if all of its 
focal elements are singletons and Shafer’s model is assumed, 
otherwise it is called non-Bayesian [1]. A full ignorance 
source is represented by the vacuous BBA m,(Q) = 1. The 
belief (or credibility) and plausibility functions are respectively 


defined by Bel(X) = Dycpeycx m(Y) and PUX) 5 


Dvepeyynxa0 MY). BI(X) 5 [Bel(X), Pl(X)] is called 


the belief interval of X. Its length U(X) 2 PIX) — Bel(X) 
measures the degree of uncertainty of X. 


A i. 
'Here, we use the symbol = to mean equals by definition. 


In 1976, Shafer did propose Dempster’s rule and we use 
DS index to refer to Dempster-Shafer’s rule (DS rule) because 
Shafer did really promote Dempster’s rule in in his milestone 
book [1]) to combine BBAs in DST framework. DS rule is 
defined by mpg(0) = 0 and VA € 2°\ {0}, 


22B,Ce2°|BNC=A my(B)m2(C) 


m A) = oes eo, 
a ae eS SCT 0 -) (6) 


(1) 


The DS rule formula is commutative and associative and 
can be easily extended to the fusion of S > 2 BBAs. Un- 
fortunately, DS rule has been highly disputed during the 
last decades by many authors because of its counter-intuitive 
behavior in high or even low conflict situations, and that is 
why many rules of combination were proposed in literature to 
combine BBAs [16]. To palliate DS rule drawbacks, the very 
interesting PCR6 was proposed in DSmT and it is usually 
adopted (PCR6 rule coincides with PCR5 when combining 
only two BBAs [6]) in recent applications of DSmT. The 
fusion of two BBAs m ,(-) and me(-) by the PCR6 rule is 
obtained by mpcr6(0) = 0 and VA € D®\{O} 


mpcre(A) = m12(A) 
m4(A)?m2(B) 
Tse a sea. 


BED®\{A}|ANB=0 


m2(A)?mi(B) 
m(A) + mi (B)” 


(2) 


where m12(A) = 3p cenejpnc=a™i(B)ma2(C) is the 
conjunctive operator, and each element A and B are expressed 
in their disjunctive normal form. If the denominator involved 
in the fraction is zero, then this fraction is discarded. The 
general PCR6 formula for combining more than two BBAs 
altogether is given in [6], Vol. 3. We adopt the generic notation 
miCR6(.) = PCR6(m1(-),ma(-)) to denote the fusion of 
my(-) and mo(-) by PCR6 rule. PCR6 is not associative 
and PCR6 rule can also be applied in DST framework (with 
Shafer’s model of FoD) by replacing D® by 2° in (2). 


III. MODIFIED RIGID COARSENING FOR FUSION OF 
BAYESIAN BBAS 


Here, we introduce the principle of MRC of FoD to reduce 
the computational complexity of PCR6 combination of orig- 
inal Bayesian BBAs. Considering the case of non-Bayesian 
BBAs, it requires decoupling all non-singletons in these BBAs 
in advance. The fusion of original nonBayesian BBAs needs 
to be decoupled by using DSmP in advance, which will be 
explained in Section IV. 


A. Rigid coarsening 


This proposal was initially called rigid coarsening (RC) in 
our previous works [17]-[19] and currently improved in our 
recent work [15]. The goal of this coarsening is to replace 
the original (refined) FoD © by a set of coarsened ones to 
make computation of the PCR6 rule tractable. Because we 
consider here only Bayesian BBA to combine, their focal 
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elements are only singletons of the FoD O = {O1, ..., On}, 
with n > 2, and we assume Shafer’s model of the FoD OQ. 
A coarsening of the FoD O means to replace it with another 
FoD less specific of smaller dimension 2 = {w, ...,w,} with 
k <n from the elements of 0. This can be done in many 
ways depending the problem under consideration. Generally, 
the elements of ( are singletons of ©, and disjunctions 
of elements of ©. For example, if O = {61, 02,63, 64}, 
then a possible coarsened frame built from © could 
be, for instance, Q = {wi = 01, we = 02,w3 = 03 UO4}, or 
Q.= {wi = 0; UO2, we = 43 U 64}, etc. 


Definition 1: When dealing with Bayesian BBAs, the pro- 
jection? m®(-) of the original BBA m®(-) is simply obtained 
by taking 


nO): (3) 


05 Ew; 


The rigid coarsening process is a simple dichotomous 
approach of coarsening obtained as follows: 
e If n= |O| is an even number: 
The disjunction of the n/2 first elements 6; to On of O 
define the element w; of ©, and the last n/2 elements 
On41 to 6, of © define the element we of Q, that is 


QF {wy = 01 U... UOg, 2 = Og41U...U On} 


and based on (3), one has 


me(ui)= dS) m9(6;), (4) 
j=l, 3 
mP(w)= SY m9(G;). (5) 
j=$t1,....n 

For example, if O= {61,02,03,04}, and one 
considers the Bayesian BBA m®(6,) =0.1, 
m® (02) =0.2, m®(83)=0.3 and m®(64) =0.4, 
then = {wi = 6; U 02, we = 03 U O4} and 


m?(w1) = 0.14 0.2 = 0.3 and m? (we) = 0.3+0.4 = 
0.7. 

e If n= |O| is an odd number: 
In this case, the element w, of the coarsened frame 2) is 
the disjunction of the? [n/2+ 1] first elements of 0, and 
the element we is the disjunction of other elements of O. 
That is 


A 
= {w1 =i ae een O[zqa], w2 = O24aj+1 Wena On} 
and based on (3), one has 


m° (uw) = m® (6;), (6) 


?For clarity and convenience, we put explicitly as upper index the FoD for 
which the belief mass refers. 
3The notation [x] means the integer part of zx first elements of © 


a m® (0). (7) 


e 


J=[$+141,...,0 


m* (wo) = 


For example, if O = {61,02, 03, 64,05}, and one con- 
siders the Bayesian BBA m®(6,) = 0.1, m® (02) = 0.2, 
m® (03) = 0.3, m°(64) = 0.3 and m°(45) = 0.1, then 
= {wi = Cal U Ag U A3, We => 04 U 65} and m®(w1) = 
0.1+0.2+0.3 = 0.6 and m°(we) = 0.34 0.1 = 0.4. 

Of course, the same coarsening strategy applies to all 
original BBAs m® = (-), s=1,...,.9 of the S > 1 sources 
of evidence to work with less specific BBAs m® = (-), 
s =1,...,.5. The less specific BBAs (called coarsened BBAs 
by abuse of language) can then be combined with the PCR6 
rule of combination according to formula (2). This dichoto- 
mous coarsening method is repeated iteratively / times as 
schematically represented by a bintree. Here, we consider bin- 
tree only for simplicity, which means that the coarsened frame 
Q consists of two elements only. Of course a similar method 
can be used with tri-tree, quad-tree, etc. The last step of this 
hierarchical process is to calculate the combined (Bayesian) 
BBA of all focal elements according to the connection weights 
of the bintree structure, where the number of layers | of 
the tree depends on the cardinality |O| of the original FoD 
©. Specifically, the mass of each focal element is updated 
depending on the connection weights of link paths from root 
to terminal nodes. This principle is illustrated in details in the 
following example. 


Example 1: Let’s consider 0 = {61, 62, 03, 04,05}, and the 
following three Bayesian BBAs can be seen in Table I: 


Table I 
THREE BAYESIAN BBAS FOR EXAMPLE 1. 


O41 0.1 0.4 0 


The rigid coarsening and fusion of BBAs is deduced from 
the following steps: 

Step 1: We define the bintree structure based on iterative 
half split of FoD as shown in Fig 1. 

The connecting weights are denoted as \j,...,Ag. The 
elements of the frames (); are defined as follows: 


e At layer 1 = 1:01 = {w; 2 0, U2 U 63, wo & 64 U 65} 


e At layer I 2:05 {wit 6,U 02, W192 = O3, Wo1 = 
94, W22 = O5} 
e At layer l 3: O3 {wii = 01, W112 Ss 62} 


Step 2: The BBAs of elements of the (sub-) frames Ol are 
obtained as follows: 
e At layer | = 1, we use Eqs (6) and (7) because |O| = 5 
is an odd number. Therefore, we get the BBAs in Table 
II: 
e At layer 1 = 2: We work with the two subframes 
Qo1 = {wii, wie} and Qo a {we1, wee} of Q»2 with the 
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A, 9, 8, A, 9; 


cr 
L 4, ee 9; ee 


Figure 1. Fusion of Bayesian BBAs using bintree coarsening for Example 1. 


Table II 
THE BBAS OF ELEMENTS OF THE SUB-FRAMES 9); FOR EXAMPLE 1. 


w1 2 61 U 60 U3 0.6 0.6 0.6 
wo 2 64U 05 0.4 0.5 0.4 


BBAs in Tables III and IV: 


Table II 
THE BBAS OF ELEMENTS OF THE SUB-FRAMES (221 FOR EXAMPLE 1. 


Focal elements 


A 
wi1 = 01 Ud2 


A 
wi2 = 03 


Table IV 
THE BBAS OF ELEMENTS OF THE SUB-FRAMES 9292 FOR EXAMPLE 1. 


3 
wo 2 64 7 
1 
woo 2 05 q 


These mass values are obtained by the proportional 
redistribution of the mass of each focal element with 
respect to the mass of its parent focal element in the 
bin tree. For example, ms (wi1) = 4/5 is derived by 
taking 


m§ (01) +m (02) 4 4 
m§ (01) + m§ (02) +m§ (63) 0.5 5 


Other masses of coarsening focal elements are computed 
similarly using this proportional redistribution method. 

e At layer / = 3: We use again the proportional redistribu- 
tion method which gives us the BBAs of the sub-frames 
Qs3 in Table V: 


m5 (wit) = 


Table V 
THE BBAS OF ELEMENTS OF THE SUB-FRAMES 923 FOR EXAMPLE 1. 


Se ee ee 


W111 me 0 


wiz 2 62 1 


Step 3: The connection weights A; are computed from 
the assignments of coarsening ments. In each layer |, we 
fuse sequentially the three BBAs using PCR6 formula (2). 
Because PCR6 fusion is not associative, we should apply 
the general PCR6 formula to get best results. Here we use 
sequential fusion to reduce the computational complexity 
even if the fusion result is approximate. More precisely, we 
compute at first mij7"° mee -) = PCR6(m™ (-), m$"(-)) and 

ma3 0) = PCR6(m% -),m§"(-)). Hence, we obtain 
the following connecting weights in the bintree: 


e At layer /=1: 


AL = mins (wi) = 0.6297 


dg = Mig | wa) = 0.3703 


e At layer / = 2: 


Ag = ming 7 (w11) = 0.4137 


ae (wi2) = 0.5863 


PCR6,Q 
ay i CR6, sa 


12)3 wo1) = 0.8121 


Ae = Min) 7? (w22) = 0.1879 


e At layer ] = 3: 


Ar = Migs (wii) = 0.3103 
Ag = se (w112) = 0.6897 


Step 4: The final assignments of elements in original FoD O 
are calculated using the product of the connection weights of 
link paths from root (top) node to terminal nodes (leaves). We 
eventually get the combined and normalized Bayesian BBA: 


m® (61) = 1° A3 + A7 = 0.6297 - 0.4137 - 0.3103 = 0.0808 
m® (02) = A1- A3- Ag = 0.6297 - 0.4137 - 0.6897 = 0.1797 
m®(03) = Ar - As = 0.6297 - 5863 = 0.3692 

m® (64) 


= A2- As = 0.3703 - 0.8121 = 0.3007 


m® (65) = A1 - Ag = 0.3703 - 0.1879 = 0.0696 
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B. Modified rigid coarsening 


One of the issues with RC described in the previous section 
is that no extra self-information of focal elements is embedded 
into the coarsening process. In this paper, the elements 0; 
selected to belong to the same group are determined using 
the consensus information drawn from the BBAs provided by 
the sources. Specifically, the degrees of disagreement between 
the provided sources on decisions (61, 62,...,9n) are first 
calculated using the belief-interval based distance dg; [20] 
to obtain disagreement vector. And then all focal elements 
in FoD are sorted in an ascending order. Finally, the simple 
dichotomous approach is utilized to hierarchical coarsen those 
Re-sorted focal elements. 


Calculating the disagreement vector. Let us consider 
several BBAs m9(-),(s = 1,..., 51) defined on same FoD O 
of cardinality |O| = n. The specific BBAs mg,(-), i =1,...,n 
entirely focused on 6; are defined by mg,(6;) = 1, and for 
X #6; mo,(X) = 0. 

Definition 2: The disagreement of opinions of two sources 
about 0; is defined as the Ly -distance between the dpy 
distances of the BBAs m9(-), s =1,2 to mo,(-), which is 
expressed by 

D12(6;) 2 


ldzr(m? (-),m6,(-)) — der(mz(-),mo,())|- (8) 


Definition 3: The disagreement of opinions of S > 3 
sources about 0;, is defined as 


S&S 38 
22 Idr(m?(-),me,(-)) 


— dgr(mP(-),me,(-))|- ) 


a 
Di_5(%i) = 


aoe 


where dpy distance is defined by [20]. For simplicity, 
we assume Shafer’s model so that |2°| = 2", otherwise the 
number of elements in the summation of (10) should be 
|D°| —1 with another normalization constant ne. 


2” —1 


dB (rm1,m2) = y|ne+ S~ [d (Bh (6;), Bla(8:))’. (10) 
i=1 
Here, ne = ia is the normalization constant and 
d'((a,b],[c,d]) is the Wasserstein’s distance defined by 
a c 2 | —a =e)? 
d'([a,0],[e,d)) = yV[$*- 4%) + 3[5*- G4), and 
BI(0;) = [Bel(6;), Pl(0;)]. 
The disagreement vector D,_¢ if defined by 
Di~s = [D1-s(61),---, Di-s(n)]- (11) 


Modified rigid coarsening by using the disagreement vec- 
tor. Once Di_¢ is derived, all focal elements {0}, 2, ..., On} 
are sorted according to their corresponding values in D,_9. 
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Let us revisit example | presented in the previous sub- 
section. It can be verified in applying formula (9) that the 
disagreement vector D,_3 for this example is equal to 


Dj ~3 = (0.4085, 0.2156, 0.3753, 0.2507, 0.4086] 


The derivation of D;_3(01) is given below for convenience: 


D,~3 = |dgr(m?(-), mo, (91)) — dar(m§(-), mo, (91))| 
+|dpr(m$(-), me, (91)) — dar(m$(-), me, (01))| 
+|dg1(m?(-), me, (91)) — dar(m$(-), me, (91))| 

= 0.4085. 


Based on the disagreement vector, a new bintree structure 
is obtained and shown in Fig 2. 


0 0, 0, 0, O, 


Sorting 


0, 0, 8 8 2, 


Figure 2. Fusion of Bayesian BBAs using MRC for Example 1. 


Compared with Fig 1, the elements in FoD O are grouped 
more reasonably. In vector D,~3,6; and 65 lie in similar 
degree of disagreement so that they are put in the same group. 
Similarly for 62 and 04. However, element 63 seems weird, 
which is put alone in the process of coarsening. Once this 
new bintree decomposition is obtained, other steps can be 
implemented which are identical to rigid coarsening in section 
to get the final combined BBA. 

Step 1: According to Fig 2, the elements of the frames {); 
are defined as follows: 


e At layer /=1: 
OQ = fur, 2 6gU 04 Us, we 2 0, UO}. 
e At layer / = 2: 
Og = {wit 2 62 U 64, wW12 S 03, w21 S 01, Wo2 S 65}. 
e At layer ] = 3: 
Og = {wir 2 O2,wiia = Oa}. 


Step 2: The BBAs of elements of the (sub-) frames 9; are 
obtained as follows: 


e At layer 1 = 1, we use (6) and (7) and we get Table VI 
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Table VI 


THE BBAS OF ELEMENTS OF THE SUB-FRAMES {2 USING MRC FOR Algorithm 1: Modified Rigid Coarsening Method 
EXAMPLE 1. 7 


Input : All original BBAs 
We He esto ek. 
wi 5 02 U 64 U 8s 0.8 0.2 1.0 Output: The final combined BBA m®(-) 
wo 261 Us 0.2 0.8 0.0 1 if Compound focal elements in 
0 :0;,U9; # W) or 6,6; AO then 
Probabilistic transformation: 
DSmP(m9(-)), DSmP(m§ (-)),..., DSmP(m§¥ (-)) 


e At layer | = 2, We use again the proportional redis- 
tribution method which gives us Tables VII and VIII. 3 end 
Here, masses of w21,we2 in ms??(-) are not considered 4 for i<ndo 


because the mass of their parent focal element (m§"(w2)) 5 for s < S do 
in bintree is 0. 6 Calculate D,_5(0;) = 
eee 2 yer jar Idar(m?(-), mo, (-)) — dar (mP(-),mo,(-))| 
THE BBAS OF ELEMENTS OF THE SUB-FRAMES $291 USING MRC FOR 7 end 
EXAMPLE |. 8 end 


9 for i <ndo 


oe 10 | Sorting D,—5(6;) in an ascending order. 


A 5 
wia = 63 12 while |O| > 2 do 


13 if n is an even number then 


Table VII Q, = . 
THE BBAS OF ELEMENTS OF THE SUB-FRAMES (222 USING MRC FOR " 6 (w1) — et = (23) 
EXAMPLE 1. 15 m'(we) = doje ndi,...gn ™ (95); 


5 16 else 

Focal elements All tess “es: 
wai = 01 ~~ j=1,.-., [$41] J)? 

Q _ : 

pimen |? LET : mi (#2) = Ljatg-tuyty,...n™ (85); 


19 end 
Th ti ights is calculated: 
e At layer / = 3, We work with the two subframes of 03, x ae R6(m® oe “mn (109)) ia 


that is Oz & {wiii, W112}, with the BBAs in Table IX. 2 end 
22 foreach focal element 0;,1 € 1,...,n do 


Table Ix 23 m®(6;) equals to the product of path link weights 
THE BBAS OF ELEMENTS OF THE SUB-FRAMES {23 USING MRC FOR from root to terminal nodes 
EXAMPLE 1. 24 end . 


[Focal ements | oO) [mC | mG) 


A 2 1 
W111 = 65 5 0.0 5 
A 
W112 = 04 2 1.0 = 


Input BBAs 
Step 3: The connection weights \; are computed from the 2a he () 


yes 
assignments of coarsening elements. Hence, we obtain the and eae 
following connecting weights in the bintree: DSmP 

Sorting focal elements 


« At layer 1 = 1: A; = 0.8333, and Az = 0.1667. —— 
e At layer / = 2: A3 = 0.5697, A4 = 0.4303, A5 = 0.5000, 
and Ag = 0.5000. 
e At layer 1 = 3: A7 = 0.0669, and Ag = 0.9331. 
Step 4: We finally get the following combined and normal- 
ized Bayesian BBA 


m®(-) = {0.0833, 0.0318, 0.3586, 0.4430, 0.0834}. 


Product of path 
link weights 


C. Summary of the proposed method 


Final Combined 
Bayesian BBA 


The fusion method of BBAs to get a combined Bayesian 
BBA based on hierarchical decomposition of the FoD consists 


of several steps (Algorithm 1) illustrated in Fig 3. Figure 3. Modified rigid coarsening of FoD for fusion. 
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It is worth noting that when the given BBAs are not increases from 100 to 1000. However, such situation 
Bayesian, the first step is to use the existing Probabilistic deteriorates when the number of focal elements increases. 
Transformation (PT) to transform them to Bayesian BBAs. In Fig 6, when the number of focal elements increases to 
In order to use the proposed combination method in the RSs, 500, time consumption of three combinations is: PCR6: 
modified rigid coarsening is mathematically denoted as © in 20.6857s; modified rigid coarsening: 7.3320s; rigid coars- 
the following sections. ening: 5.9748s. This phenomenon also proves that it is 


reasonable to map original FoD to the coarsening FoD, 
with the aim of reducing the number of focal elements at 
the time of fusion. But in any case, computing efficiency 
of rigid coarsening or modified rigid coarsening is still 
Assuming that the FoD is better than PCR6. On the other hand, modified rigid 


= coarsening makes a significant improvement (accuracy) 
© = {61, 02, 03, 64, O5, 96, 97, 98, 99, 910, O11, at the expense of parts of the computational efficiency. 
912,913, A14, O15, O16, 917, 18, F19, P20} 


then 1000 BBAs are randomly generated to be fused Computational time vs. Number of BBAs 
with three methods: modified rigid coarsening, rigid 
coarsening and also PCR6. And then distances of fusion 
results are computed using dg; between two pairs: mod- 
ified rigid coarsening and PCR6; rigid coarsening and 
PCR6. Comparisons are made in Fig 4, which show the 
superiority of our new approach proposed in this paper 
(The average value of the approximation of modified 
rigid coarsening is 97.5% and original rigid coarsening 
is 94.5%). Here, similarity represents the approximate 
degree between fusion results using hierarchical approx- 
imate method (both rigid and modified rigid coarsening) 
and PCR6. 


D. Simulation considering accuracy and computational effi- 
ciency 


e Accuracy: 


PCR6 
Modified Rigid Coarsening 
Rigid Coarsening 


Second (s) 


.0 
100 200 300 400 500 600 700 800 900 1000 
Number of BBAs 


0.98 tate pp 
Figure 5. Efficiency comparisons between MRC, RC and PCR6 (With the 


0.96 | number of BBAs increasing). 


Similarity 
Oo 
‘fo 
eS 


099 Computational time vs. Number of Focal_Element 


207 e—x PCR6 


+— Modified Rigid Coarsening 
4—a Rigid Coarsening 


0.9 


—e— Rigid coarsening 
° *—~ Modified rigid coarsening 


n 1 i n 
0 100 200 300 400 500 600 700 800 900 1000 15 


Figure 4. Accuracy comparisons between MRC and PCR6 (Only Singletons). 


BR 
fo} 


Second (s) 


e Computational efficiency: 
As we mentioned before, another advantage of the hi- 
erarchical combination method is the computational ef- 
ficiency. Here, two experiments are conducted (All ex- 
periments are implemented on a PC with I3 CPU, Inte- : 
grated graphics chipsets and 4G DDR): 1) the number %0 100 150 200 250 300 350 400 450 500 
of singletons is unchanged while the number of BBAs Number of Focal_Element 
to be fused is increasing; 2) the number of BBAs is 
unchanged while the number of singletons in FoD is Figure 6. Efficiency comparisons between MRC, RC and PCR6 (With the 
increasing. The results are illustrated in Fig 5 and 6. From — ™mber of focal elements increasing). 
experiment 1, all these three methods (classical PCR6, 
rigid coarsening and also modified rigid coarsening) cal- 
culate quickly (less than 1.2s) even the number of BBAs 
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IV. A RECOMMENDER SYSTEM INTEGRATING WITH 
HIERARCHICAL COARSENING COMBINATION METHOD 


In today’s e-commerce, online providers often recommend 
proper goods or services to each consumer based on their 
personal opinions or preferences [21], [22]. However, it is 
a tough task to provide appropriate recommendation which 
may confront several difficulties. One difficulty is that users’ 
preferences are usually characterized as uncertain, imprecise 
or incomplete [23], [24], which cannot be used directly in 
RSs. Besides, it is easy to understand that when the more 
information about user preferences are, the more accurate 
prediction of RSs will be [25], [26]. But, the problem is that 
which method we adopt to integrate multi-source uncertain 
information? 

As a general framework for information fusion, DST can not 
only model uncertain information, but also provide an efficient 
way to combine multi-source information. These mentioned 
features make this theory a wide range of applications [27]- 
[29], especially in RSs [23], [25], [30]-[32]. According to 
DST, users’ comments on products in RSs are described by 
using mass functions and rules of combination method are 
used frequently in order to provide appropriate recommenda- 
tion. 

As mentioned in previous sections, both the performances 
of combination rules in DST or in DSmT suffer from computa- 
tional complex which is obviously ignored in [23], [25]. Thus, 
in this paper, modified rigid coarsening method is applicable to 
combine the imprecise users’ preferences in RSs. First, we are 
required to introduce the relevant knowledge of RSs. Actually, 
almost all characteristics of RSs have been introduced in [23], 
[25], [30]-[32]. 

First, we give the corresponding representation of the math- 
ematical notation in RSs based on DSmT. RSs usually contain 
two objects: Users, Items. A set of M users and a set containing 
N items is respectively denoted by U = {U1, U2, ..., Unc} and 
I= {h, l,..., In}, Besides, we assume that users can give 
the corresponding ratings to the items, which include L rating 
levels (O = {61, 00, ...,0,}). Here, L preference levels means 
multi-level evaluation results. For example, four-levels of user 
evaluation on the product are Excellent, Good, Fair, Poor.. 
rj, Means a rating of user U; on item J, and a rating matrix 
R= 7r;,, comprises all the ratings of users on items. It should 
be noted that r;, is originally modeled as a mass function 
mix : D©° — [0,1]. Additionally, let I? and U?’ denote the 
set of items rated by user U; and the set of users having rated 
item J;,, respectively. 

Contextual information can often be summarized into sev- 
eral genres that significantly affect user’s rating of items. Nor- 
mally, we represent contextual information by a set containing 
P genres, denoted by S = {5S}, S2,..., Sp}. And each genre 
Sp, with 1 < p < P contains at most Q groups, denoted by 
Sp = {9,15 9p.23 +s Ip.qs 1 Gp,Q},1 <q < Q. For a genre 
Sp € S, a user U; € U can be interested in several groups 
and also an item J; € J can belong to one or some groups of 
this genre, which can be seen in Fig 7. 


——— > Belong to 


-=-=-=-> Interested in 


Figure 7. Contextual information. 


Definition 4: In order to facilitate such expression, two 
functions K(-) and (p(-) are defined to determine the groups in 
which user U; is interested and the groups to which item I;, 
belongs, respectively: 


Kp : U; > Kp(U;) C Sp (12) 


Pp: Ik > pp) C Sp (13) 


Generally, the main steps of a recommendation system is 


illustrated in Fig 8. 
User’s Ratings “, 


a 


Rating Matrix R 


M 


Contextual Information 


DSmT Modeling Function M 


Predicting unrated items in R 


Le 


Computing user-user similarity 


! S 


Selecting neighborhoods using threshold Target User 


2a 


Prediction fusion &Rate prediction 


Figure 8. General process of recommendations. 


The functional blocks of Fig. 8 are as follows: 

1) DSmT-Modeling Function 
Regarding the DS-partial probability models proposed in 
[23], the existing ratings r;,,, of user U; on item J;, are 
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modeled by DSmT-modeling function M(-) in order to 
transform such hard ratings into the corresponding soft 
ratings represented as m;,, as below: 

Definition 5: 


ain(l—oin), for A=; 
10ikOik, for A=B,; 


Mik = 501,k0i,k, for A=C; (14) 
l—aix, for A=9O; 
0, otherwise. 
with 
01 Uo, af bead; 
— Op_-1 U@z, ifl=L; 
0-1 UO, U O41, otherwise. 
6, U 2, if l=1; 
C= 6p_-1U6z, if l=L; 


(€:-194),0:9 0141), otherwise. 


where a; € [0,1] and o;,, are a trust factor and a 
dispersion factor, respectively [23]. 


Referring to the partial probability model analysis in 
[23], we also give the corresponding user profiles which 
can be seen in Fig 9. 


Transformed “True” Soft-Ratings 


Original Movielens Hard-Ratings 


Figure 9. DSmT modeling function. 


Compared to [23], the difference is that we not only 
consider the union n (black and gray rectangle), but 
also consider the intersection (red rectangle) of the hard 
ratings, which is also the distinction between DS theory 
and DSmT theory. 

Lemma 1: Referring to Definition 5, we can also 
generate the relative refined BBA in the framework of 
DS theory: 
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2) 


a4k(1 — 4,4), for A=; 


Refined = Qi kik, for A = B; (15) 
i,k l—aix, for A=0; 
0, otherwise. 
with 
6; Ube, ifl= 1; 
B= 6p_-1 U8Oz, af tah; 


0-1 U4 U 6141, otherwise. 


where a;,, € [0,1] and o;;, are a trust factor and a 
dispersion factor, respectively [23]. 


After soft ratings are generated, DSmP [33] is applied 
to decouple non-Bayesian m;,,, since the hierarchical 
fusion algorithm is currently just available for Bayesian 
BBAs. 


Definition 6: DSmP is a new generalized pignistic 
transformation defined by DSmP-(0) = 0 and for any 
singleton 0; € O by 


DSmP.(6;) 2 m(6;) + (m(0;) + €)x 


3 m(A) 
A€28 OCA, |A|>2 27 BE2®, BCA |Bja1 ™B) + € -|AI 
(16) 


As shown in [33], DSmP makes a remarkable improve- 
ment compared with BetP and CuzzP, since a more 
judicious redistribution of the ignorance masses to the 
singletons has been adopted by DSmP. « is a small 
positive number, typically « = 0.001. 


Predicting unrated items: 

Assuming that users who are keen on the similar groups 
tend to have common preferences. In this RS, it is 
necessary to predict the unrated items first. Considering 
a group gpg € Sp with gpg € p(x), every soft rating, 
M;,k, of user U;, who is keen on group gp,q, on item I, 
is regarded as a block of common preference for group 
Jp.q- Thus, Gm, ,,. : D©° — [0,1] which represents all 
users’ group preferences on item J; regarding group 
Yp,q> 18 computed as follows 


= ® 


mj,h- (17) 
{i| In €I}* ,9p,q€ Kp (Uj). 9p,.q€ Pp (Ie) 


™Mp,q,k 
Supposing that item J;, has not been rated by user Uj, 
it usually contains three steps to generate unprovided 
rating r;,, of user U; which are shown as below 


e Step one: Considering a concept S,, for each group 
9p.q © Kp(Ui) NYp (x), it is assumed that all users’ 
group preferences on item J;, k regarding group gp,q 
imply common preference of U; on J; regarding 
group g,,,- Furthermore, this group preference is 
regarded as a piece of user U;’s concept preference 
on item J;, regarding concept S,,. Therefore, concept 


3) 


4) 
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preference of user U; on item I; regarding concept 
Sp, denoted by mass function Sj, : DD? => 
(0, 1], can be computed as below 


S Cid Lin X18) 


Wiper ic) 
{al9p,q€"p(U;),9p,qE Pp Ie) } 


e Step two: If there exists at least one common group 
in concept S, which item J;, belongs to and also 
user U; is interested in, then U;’s concept preference 
on item J;, regarding concept S, p is regarded as 
a piece of context preference. Therefore, this user’s 
contextual preference on item J;,, denoted by mass 
function S;,,, : D° — (0, 1], is achieved as follows 


Omit = @ (19) 


1 pomprisk’ 
p=l,..., 


e Step three: Context preference of U; on item I; is 
assigned to unprovided rating 77;,, as below 


(20) 


Mik = Simin : 


So far, all unprovided ratings are predicted in this 
RS. Subsequently, user-user similarities are com- 
puted depending on both provided and predicted 
ratings in the following steps. 


Computing user-user similarities: 
Here, we use the distance measure proposed in [34] to 
calculate distances between two users U; and U; with 
i # j, which is defined as below 


DU; U;) = s (In max mj, (8) —|nmin mj (9) 
= k=1 GEO Mik 0 0EO min (0) } 
a (21) 


where m;,, and m,, are the soft ratings of user U; and 
user U; on item Ij, respectively. Afterwards, the degree 
of similarity between U; and U;, denoted by s;;, is 
calculated as follows 


-—yx D(U;4,U; 


Sig =e ) where y € (0, 00). (22) 


Obviously, if the value of s;; is high, it means the 
user U; and user U; are very close, and vice versa. 
Eventually, a mathematical matrix S = {s;,;|Ui,U; € 
U,i # j} is employed to represent the similarities 
among all users. 


Selecting neighbors based on user-user similarities: 
Taking into account an active user U;, for each unrated 
item J, by user U;, a set containing K nearest neighbor- 
hoods, denoted by 3t;,,, is chosen by using the method 
proposed in [35]. Two simple steps of this method are 
shown below 
e Step one: the process of such selection depends on 
two criteria: 1. Those users who rated J;, and 2. The 
corresponding user-user similarities with user U; are 
equal or greater than the threshold 7. R;,;, denotes 
the selected set, which is acquired as follows: 


Rin = {Uj Ue € Tf, 815 >T}. (23) 


e Step two: all of members in R;,, is descending 
sorted by s;,; and top AK members are selected as 
the neighborhood set R;,x. 


5) Estimating ratings according to neighborhoods: 
Supposing that item J;, has not been rated by user Uj. 
The predicted rating of U; on item J, is denoted as 
Mj,~. Thus, ™,;,;, is calculated according to the ratings 
of user U;’s nearest users. Mathematically, m,,, is given 
as below 


Mik = Mik B Mik; (24) 


where ™m;,x is the mass regarding the neighborhoods’ 
whole preference in the set Eq (23) on item I;,. Con- 
sidering user U; € ¥j,~, and supposing that s;; is 
the similarity between user U; and user U;. We use a 
discount rate 1 — s;,; to discount the rating of user U; 
on item J;,. 

Therefore, ™m,, is: 


eSis9 
Jk? 


(25) 


Mk= 8 
{9|Uj ERi,x } 


8i,j x mjn(A), for AC ©; 
81,5 X Mj,n(O) + (1 — 8:3), for A= 0. 


6) Generating recommendations: 
In order to generate appropriate recommendations for 
the candidate user U;, predicted ratings of U; on all un- 
provided items are sorted, and then based on the sorted 
list, the appropriate recommendations are generated. 


V. EXPERIMENTS 


To evaluate the performance of modified rigid coarsening in 
precision of recommendation and computational time, original 
rigid coarsening method and also classical PCR6 combination 
method are selected to be regarded as baselines. Besides, we 
use DS-MAE [23] to measure the precision of recommenda- 
tions. 

Definition 7: DS-MAE is mathematically given as follows 


DS — MAE(6;) = a S> lin(;) —M@))I, 
J! G,k)ED;,01€0 
(26) 
where Dj; is the testing set identifying the user-item pairs 
whose true rating is 0; € O. 

Those specific users’ interested information about genres is 
unknown. Thus, we define a rule that if a user has rated an 
item then this user is interested in all genres to which the item 
belongs. 


1) Experiment One: 


Movielens* is a movie recommendation dataset widely 
used for benchmarking process. There are nearly 
100,000 hard ratings on 19 different types of movies 


‘http://grouplens.org/datasets/movielens 
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DS-MAE 
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(Action, Comedy, and so on). The domain of such 
rating given in Movielens includes 5 levels, denoted 
as O = {1,2,3,4,5}. At the same time, each user is 
required to evaluate at least 20 movies, so as to ensure 
adequate rating information. The relevant parameters 
used in RSs are set: 7 = 10~4 and V(i, k){ai,n, 01,0} = 
{0.9, 2/3}. However, Setting parameter 7 to be a fixed 
value is obviously unreasonable because the similarity 
between two users is quite different when using different 
combination methods. Hence, in this paper, the value 
of parameter 7 will not be set in advance. Instead, 
it is determined based on the similarity in matrix S. 
Specifically, the highest value of top 30% in S is selected 
for T. 


Additionally, we adopt the robust strategy of 10-fold 
cross validation to conduct experiments, which is 
widely applied in experimental verification. Specific 
steps are as follows: original ratings in Movielens are 
first randomly divided into 10-folds and the experiments 
are thus carried out 10 times: in each sub-experiment, 
nine tenths of the ratings are chosen as training data and 
the remaining ratings are regarded as testing data. It’s 
worth noting that all results illustrated in the following 
experiments are the average values of 10 times. 


The figure 10 demonstrates the values of overall DS- 
MAE varying with changing neighborhood size K. And 
the smaller values of DS-MAE indicate the better ones. 
As can be seen in Fig 10, with kK < 70 performances 
of the three methods increase sharply as well as being 
the same as each other. With kK > 70, performances 
of both methods become stable. Especially, performance 
of modified rigid coarsening method is very close to 
classical PCR6 rules. However, original rigid coarsening 
is slightly worse than the other two algorithms. 


Overall DS-MAE vs. K (Movielens) 


*—* PCR6 
+—+ Modified Rigid Coarsening 
4a Rigid Coarsening 


Figure 10. Overall DS-MAE between three combination methods. (Movie- 


lens). 
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The figure 11 depicts the computational time varying 
with changing neighborhood size K. In this figure, the 
time taken by hierarchical coarsening combination meth- 
ods (both rigid coarsen ing and modified rigid coarsen- 
ing method) is quite faster compared to classical PCR6. 
Besides, modified rigid coarsening is relatively slower 
than original rigid coarsening. All these results illustrate 
that modified rigid coarsening method sacrifices some of 
the computational efficiency, in exchange for upgrading 
the accuracy of approximation. 


Sap Overall computational time vs. K (Movielens) 
T T T T 


*—*x PCR6 
+—+ Modified Rigid Coarsening 
4A Rigid Coarsening 


wW 
So 
Oo 


Figure 11. Overall computational time between three combination methods. 
(Movielens). 


2) Experiment Two: 


Flixster’ is a classical recommendation dataset which 
nearly contains 535013 hard ratings on 19 different types 
of movies (Drama, Comedy, and so on). The domain of 
such rating given in Flixster includes 10 levels, denoted 
as O = {0.5,1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0}. 
At the same time, each user is required to evaluate 
at least 15 movies, so as to ensure adequate rating 
information. The relevant parameters used in RSs are 
set: y = 10-4 and V(i,k){ai,n,c1,4} = {0.9,2/3}. 
However, Setting parameter 7 to be a fixed value is 
obviously unreasonable because the similarity between 
two users is quite different when using different 
combination methods. Hence, in this paper, the value 
of parameter 7 will not be set in advance. Instead, 
it is determined based on the similarity in matrix S. 
Specifically, the highest value of top 50% in S is 
selected for T. 


The figure 12 demonstrates the values of overall DS- 
MAE varying with changing neighborhood size kK. And 
the smaller values of DS-MAE indicate the better ones. 
As can be seen in Fig 12 we can get a similar result 
to the previous data set(Movielens). Especially, perfor- 


Shttp://datasets.syr.edu/datasets/Flixster.html 
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mance of modified rigid coarsening method is in the 
middle of the comparison methods. However, original 
rigid coarsening is worse than the other two algorithms. 


Overall DS-MAE vs. K (Flixster) 


*—*x PCR6 
+— Modified Rigid Coarsening 
a&—a Rigid Coarsening 


DS-MAE 


0 20 40 60 80 100 120 140 


Figure 12. Overall DS-MAE between three combination method. (Flixster). 


The figure 13 depicts the computational time varying 
with changing neighborhood size kK. From this figure, 
we can also get the same conclusion that the time taken 
by hierarchical coarsening combination methods (both 
rigid coarsening and modified rigid coarsening method) 
is quite faster compared to classical PCR6. 


— Overall computational time vs. K (Flixster) 


5000 
4000 


3000 


Second (s) 


2000 *—*x PCR6 
+— Modified Rigid Coarsening 
1000 4—a Rigid Coarsening 
0) 20 40 60 80 100 120 140 
K 


Figure 13. Overall computational time between three combination methods. 
(Flixster). 


VI. CONCLUSION 


In this paper, we propose a new combination method, 
called modified rigid coarsening method. This new method 
can map the original refined FoD to the new coarsening FoD 
in the process of combination. Compared to traditional fusion 
method PCR6 in DSmT, this approach can not only reduce 
computational complexity, but also ensure high approximation 
accuracy. Besides, in order to verify the practicality of our 


approach, we apply this approach to fuse soft ratings in RSs. 
To be specific, user preferences are first transformed by DSmT- 
partial probability model to accurately represent uncertain 
information. Then, information about user preferences from 
different sources can be easily combined. In the future work, 
more helpful information will be mined to discern focal ele- 
ment in FoD so as to improve the accuracy of approximation 
and more data sets will be applied. 
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Abstract—To prevent disastrous consequences imputed to levee 
breakage, assessment methodologies have to be improved. Geo- 
physical and geotechnical investigation methods are usually used 
to make such assessments. However, the effective combination 
of these two specific types of data remains a challenge. We 
propose the fusion of geophysical and geotechnical data by means 
of Belief Functions. Here we demonstrate our approach on a 
synthetic case study including geophysical (electrical resistivity) 
and geotechnical (cone-bearing) data and by implementing Smets 
and PCRS5 normalization rules. This new data combination 
approach allows the characterization of horizontal interfaces and 
of a geological structure initially hidden by the effects of a highly 
conductive body. 


Keywords: data fusion, belief functions, geophysical data, 
geotechnical data, experimental test bench, electrical resistivity 
tomography. 


I. INTRODUCTION 


Fluvial levees are elevated partitions between channels and 
floodplains [1], built for flood protection. These structures are 
considered as hazardous and may fail, leading to disastrous 
consequences such as human and material loss and economic 
disasters. Levee assessment acknowledged methodologies usu- 
ally include geotechnical and geophysical investigation meth- 
ods [2]. While geotechnical investigation methods are intru- 
sive, they provide quite accurate and punctual information. 
Conversely, geophysical methods are non-intrusive and pro- 
vide physical information on large volumes of subsoil with 
high output (according to the chosen method and acquisition 
mode) and potentially significant uncertainties. These associ- 
ated uncertainties can particularly be attributed to the indirect 
and integrating nature of the methods as well as to the non- 
uniqueness of inverse problems solution. One of the important 
issues when assessing levees is the combination of geotechni- 
cal and geophysical data [3]. Furthermore, one should take into 
consideration their respective imprecisions, uncertainties and 
contrasting spatial distributions. We suggest the use of Belief 
Functions (BFs) and combination rules to merge geotechnical 
(cone bearing) and geophysical (electrical resistivity) data. Our 
data fusion methodology is being optimized and tested on both 
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real and synthetic data. In this paper, we aim to demonstrate 
the potential of this methodology by means of a synthetic 
study that exemplifies a simplified levee case study. Indeed, we 
show the ability of combined geotechnical (cone bearing) and 
geophysical (electrical resistivity) data to discriminate three 
type of geological materials in the presence of a conductive 
anomaly, which has a significant bias effect on the geophysical 
data. The reader can refer to the theoretical basis of BFs, 
introduced by Shafer [4]. The use of BFs needs: (1) to select 
a common frame of discernment (FoD) of the considered 
problem, (2) to determine the masses of belief or Basic Belief 
Assignments (BBAs) from available data (geotechnical and 
geophysical), and (3) to choose a rule of combination. 


II. FOD AND BBAS CONSTRUCTION 


For the addressed levee assessment issue, we consider three 
classes of distinct materials 6;, 02 and 63. Since the FoD, O, 
must consist of a set of exclusive and exhaustive hypotheses, 
we will be using a fourth class 4 to cover the physical 
characteristics of materials not included in the three first sets. 
Thus we use 0 = {61,62,03,04}. The construction of the 
BBAs for each data source consists in assigning each data 
type (geophysical and geotechnical) to O. 


III. CONSTRUCTION FROM GEOPHYSICAL DATA - 
ELECTRICAL RESISTIVITY VALUES 


Since electrical resistivity (ER) tomography is one of 
the most widely used methods for levee investigation, we 
propose the use of ER as geophysical data. As a frame- 
work to exemplify a fluvial levee problematic we consider 
two soil layers: an upper resistive layer (10° Q.m) stand- 
ing for sands [5] and a subjacent and more conductive 
one (10 {2.m) standing for a clayey layer starting at 6 m 
depth. An inhomogeneity (10? Q.m) standing for a silty 
lens of about 1.3 m high and 40.5 m wide is positioned 
at 7 m depth (Figure 1.a). We then associate ER classes 
of specific soils (split into ranges of ER in Q.m) to O 
so that: 0; = [5,20], @2 = [50,2- 107], 63 = [5- 107, 2- 10%] 
and 64 = (0.2, 5[U]20, 50[U]2 - 107, 5 - 10?[U]2 - 103, 5 - 10%). 
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We finally place a very small (0.5 x 0.5m) and highly conduc- 
tive body (10~° Q.m) centered at 1.35 m depth (represented 
in green in Figure 1.a), that could be considered as a metallic 
pipe. Even though this body is small in the XY plane, it might 
have an elongated shape in the Z direction. We use Res2Dmod 
free software [6] to simulate noiseless data acquisition from 
the chosen resistivity model (Figure 1.a) and then use the 
Res2Dinv software (ver. 3.71.118) [7] to get the inverted ER 
section as one would obtain from the processing of survey 
data (Figure 1.b). 

The discrimination between sands and clays is obvious 
while the distinction of the silty lens is not. Indeed, the 
conductive local body generates resistive artefacts close to its 
position and a very large and conductive artifact deeper in 
the subsoil, hiding the presence of the silty lens. The image 
given by the inverted ER (Figure 1.b) strongly departs from 
the true model (Figure 1.a). We finally use the Res2dinv 
discretization grid for the BBA m4(-) corresponding to each 
event of 2° (each event of © plus their possible unions and 
the empty set). The values of the masses are set using the 
Wasserstein distances [8] between (a) an inverted ER value + 
its uncertainty, issued from the Res2dinv result, and (b) the 
interval corresponding to each event, so as each cell of the 
grid gets a normalized BBA. 


IV. CONSTRUCTION FROM GEOTECHNICAL DATA - 
CONE-BEARING VALUES 


We use artificial cone bearing values (expressed in MPa) 
as geotechnical data. These physical values could have 
been obtained from a cone penetrometer test (CPT) in- 
vestigation campaign. Indeed, CPT campaigns are widely 
used to investigate embankment levees [2]. We simulate 
a data acquisition from four boreholes with an interspac- 
ing of 20 m, drilled to 17 m depth with a vertical ac- 
quisition every 50 cm (dashed lines in Figure 1l.a). Two 
of the boreholes happen to go through the silty lens. We 
assign intervals of cone bearing values (in MPa) to O 
so that: 0; = [2,8], 02 = [20,80], 03 = [2- 10?,8- 107] and 
04 = [0.1, 2[U]8, 20[U]80, 2 - 10?/UJ8 - 107, 103]. These inter- 
vals can be associated to specific soil types [9] such as clays 
for low values, silty soils for intermediate values and sands 
for higher ones. We assume a mass of belief equal to 1 in 
the borehole and impose an exponential lateral decrease of 
the trust in the data (following the mean horizontal scale of 
fluctuation of about 50 m proposed by Phoon and Kulhawy 
[10]. The geotechnical grid depends on the distance between 
the boreholes and the vertical sampling. Thus, for each cell, a 
second BBA ma/(-) is proposed, entering in the fusion process 


V. COMBINATION OF BBAS AND PRELIMINARY RESULTS 


We suggest the use of a fusion mesh containing all the 
meshes from both the geophysical and geotechnical grids so 
that we avoid data interpolation that might lead to unnecessary 
data alteration. The data merging consists in combining ™ (-) 
and m2(-) assigned to each cell of the grid. While many rules 
of BBA combination have been proposed, in this work we 


focus on two of them: Smets’ rule [11] and the Proportional 
Conflict Redistribution rule no. 5 (PCR5) [12]. Smets’ rule 
(conjunctive rule under an open-world assumption) allows the 
quantification of the conflict level of our two information 
sources (geotechnical and geophysical sources) represented by 
(Eq. 1): 


m42(0) = my(X1)m2(X2) (1) 
X1,X2CO|XiNX2=0 


Thanks to the latter rule, we are able to point out the con- 
flictual zones in the vicinity of: the horizontal interfaces, the 
silty lens, the local very conductive body and the resistive and 
conductive artifacts (in red, Figure 2.a). The fusion, following 
Dempster-Shafer’s rule (Eq. 2) (closed world assumption) [4] 
with the PCR5 normalization [12] (Figure 2.c) is fairly close 
to the true model we used (Figure 1.a). This normalization 
allows the spreading of the conflict masses m12() to other 
events of 2°. 


mA) =p ma) ma(X2) 


X1,X2CO0|X1NX2=A 
(2) 
It exhibits a quite clear view of the interface between sands 
and clays and allows the visualization of the silty lens despite 
the blind zone generated by the conductive anomaly (Figure 
1.b). As a decision-making support, we propose to display the 
events having the highest belief masses (Figure 2.a and 2.c) 
and their associated degrees of belief (Figure 2.b and 2.d). 
Via our procedure, even though the highly conductive 
anomaly (that can be associated to a metallic pipe) is not 
clearly detected and characterized, Figure 2.a still points out 
a conflictual zone around the position of that anomaly. Unfor- 
tunately, we still have incorrect material type determination 
on the 19 first horizontal meters and 15 last horizontal meters 
about the sand/clay interface because of the wrong ER values 
proposed by the inverted geophysical model (Figure 1.b). In 
the future, this kind of under-determination may be minimized 
by reconsidering the way to decrease the lateral trust of the 
geotechnical data. 


VI. CONCLUSION 


The use of BFs for the fusion of geophysical and geotech- 
nical investigation data is promising. Indeed, it enables to 
highlight the presence of an interface between two geological 
media much more precisely than the geophysical method 
alone, using Res2Dinv. Furthermore, it enables the reliable 
estimation of the complete extension of a lens with interme- 
diate ER and cone bearing values, even though the effects of 
a local and highly conductive body (that can be associated to 
a transversal metallic pipe) hides the geological lens. Without 
normalization, Smets’ combination rule easily spotlights the 
conflicting zones. Such information could also be precious 
during an investigation campaign, indicating zones where 
survey has to be strengthened. 
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Fig. 1. 2D section of subsoil displaying a) true ER with borehole positions in dashed lines and associated cone bearing values in white and b) inverted ER 
model displayed in model data blocks with RMS error = 1.11%. 


| Conflict 


Fig. 2. Data merging with Smets (a, b) and PCR5 normalization rule (c, d). (b) and (d) represent the BBAs associated to the most plausible events respectively 
presented in (a) and (c). The black lines stand for the interfaces and the inhomogeneities fixed in the model (Figure 1.a) while the dashed lines stand for the 
borehole positions. 
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This will be in the heart of further studies in order to be 
able to propose the most pertinent positions for geotechnical 
boreholes thanks to belief functions and combination rules, 
therewith to improve fluvial levee assessment. This algorithm 
will also be employed using real data acquired on a scale 
model as well as ona levee in order to propose a 3D modelling. 
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Abstract—This paper presents a study on human perception 
of the heading on the base of motion and form visual cues 
integration. The authors examine how human age influences this 
process. Because the visual stimuli are in general uncertain, or 
in some cases even conflicting, the process of combination is 
estimated on the base on the well known Normalized Conjunctive 
Consensus fusion rule, as well as on the base of the more 
efficient Dezert-Smarandache Theory (DSmT) of plausible and 
paradoxical reasoning, and more precisely on the probabilistic 
Proportional Conflict Redistribution rule no.5 defined within it. 
The main goal is focused on how these fusion rules succeed to 
model consistent and adequate predictions about both individu- 
als’ behavior, and age-contingent groups of individuals. 


Keywords: vision, heading perception, form cue, motion cue, 
cues combination, DSmT, probabilistic proportional redistri- 
bution rule no.5, normalized conjunctive rule. 


I. INTRODUCTION 


Form and motion information are closely linked and 
continuously interacting in the human visual system, 
which takes the advantage to utilize both of these visual 
characteristics (or so called cues) to make decisions about 
human heading perception [1] described via the respective 
rapid eye movement (so called saccades) towards the object 
of interest position. The cooperation between the form 
and motion cues becomes very useful and even necessary, 
when: (i) each cue (motion, form) alone does not supply 
sufficient information to estimate the proper and accurate 
heading, or/and, (ii) the uncertainty, associated with the 
utilized visual cues and the possible conflicts between them 
influence negatively the process of decision making. The last 
case relates closely to the effect of the age-related changes 
throughout the life cycle and to deterioration in the cognitive 
processes, and consequently in visual information processing 
due to a variety of factors like cell death, cognitive de- 
differentiation, increase of internal noise in the visual system. 
As a result, the contrast sensitivity, self-motion perception, 
as well as eye movement characteristics are deteriorated in 
the elderly [2], [3], [4]. To overcome all these difficulties 
one needs to combine and utilize in an effective way both 
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of cues in order to achieve inferences, more informative 
and potentially more accurate than if they were obtained 
by means of a single cue. Integration of information from 
multiple sources (cues) in a single modality increases the 
precision of perceptual performance. Such a claim recently 
has been supported by a list of neurobiological studies, like 
[5], [6], [7], and also neurophysiological findings exist about 
neurons responding to both form and motion in some cortical 
sites (including early visual areas and extrastriate areas) [8], 


[9]. 


Inspired and based on these important biological findings of 
the cue combination effectiveness, the aim of this paper is to 
investigate how humans integrate motion and form information 
in the process of decision making about heading direction. 
The authors will focus on how the human age influences this 
process, and also whether the human visual system is enable 
to adapt during the life cycle in order to exploit all available 
information, providing a sensible and meaningful decision 
about the problem under concern. In our study we simulate 
only the directional flow occurring during the forward motion 
of the observer and not the changes in speed or size of 
the moving objects that accompanied it. The researcher 
team will compare human cue combination performance 
with modelled combination performance, based on particular 
fusion rules. In the presented study the authors will apply 
and compare the performances of the following fusion rules: 
the Normalized Conjunctive Consensus (NCC), and the very 
recent probabilistic Proportional Conflict Redistribution rule 
no.5 (pPCR5) defined within DSmT. The novelty of our study 
consists in applying especially this novel pPCRS fusion rule to 
model the human process of form and motion cues integration. 


This paper is organized as follows. In section II we briefly 
present the form and motion combination process, and the 
principles of the used fusion rules, applied to model the human 
cue integration. Section III is devoted to the experimental 
strategy, methods, procedures, stimulus, apparatus, and also 
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subjects participating in the experiments. The results obtained 
are described and analysed in Section IV. Conclusions are 
made in Section V. 


II. FUSION RULES FOR MODELLING VISUAL CUE 
COMBINATION 


Various fusion rules exist in the literature to deal with 
uncertain or even conflicting evidence based on different 
mathematical models and on different methods for transferring 
the conflicting mass onto the sensible hypotheses about the 
problem under consideration. The classical one is Bayesian 
inference [10], [11] which deals with probabilistic information. 
The main idea of Bayesian inference is to obtain the most 
reliable estimate of the state of the world on the base of 
independent cues combination, i.e. the estimate in which the 
variance of the resulting combined cue is minimized. But being 
very sensitive to the sources with the bigger means, it could 
neglect part of available information, which is not adequate 
and reliable behavior in cases of conflicting visual cues 
combination. Bayesian inference has some difficulties to apply, 
related to the requirements of measurements’ statistics and 
knowledge about the a priori information. Dempster-Shafer 
Theory (DST) [12], [13] was the first theory for combining 
uncertain information expressed as basic belief assignments 
with Dempster’s rule. Although appealing in modelling the 
epistemic uncertainty this theory shows very questionable and 
controversial results in cases of high (and even low) conflicting 
sources of evidence [14], [15], [16], [17]. 

To overcome all these limitations of DST, Dezert- 
Smarandache Theory of Plausible and Paradoxical Reasoning 
was developed [18]. 

DSmT works for any model, which fits adequately with 
the true nature of the fusion problem under consideration. 
It is a general mathematical framework for managing and 
solving problems of uncertain, highly conflicting, imprecise 
knowledge representation and fusion, and decision making 
procedures, based on vague, imprecise models for a wide class 
of static or dynamic fusion problems. 


A. Normalized Conjunctive Consensus rule 


The Normalized Conjunctive Consensus (NCC) rule is used 
to combine simultaneously assumed independent visual cues. 
In the case considered in our paper, the information obtained 
by the available form and motion cues is characterized by 
Gaussian likelihood functions with given means p;,7 = 1, 2,.. 
and standard deviations o;,2 = 1,2,.., defining the un- 
certainty encountered in data. In case of two independent 
cues with one-dimensional Gaussian distributions p;(2) = 
ee exp 3 (AGH)? and po(x) = Fe exp —3()’, 
the combined distribution based on NCC rule becomes: 


1 1/2 — bye \2 
p (x) exp —=( ) ’ (1) 
NOG Once Vv 21 2 Once 
a 
where 0%, = ae and Myce Onde e + = 


It is characterized with a mean, biased toward the function 
with the bigger of the two means, similarly to Bayesian 
estimator. It is optimal, i.e. minimizes the variance of the error 
estimation, when the original distributions have close mean 
values. When both cues are in conflict, however, (characterized 
with distant distributions), NCC rule leads to neglecting part of 
the available information, because the source with the bigger 
mean is weighted more heavily. In this case it is reasonable to 
keep the original distributions in the fused probability density 
function until it is possible to make reliable decision. This has 
been done by pPCR5 fusion rule defined in DSmT. 


B. Probabilistic Proportional Conflict Redistribution rule no.5 


The general principle of all Proportional Conflict Redis- 
tribution rules [18], Vol.3 is to: 1) calculate the conjunc- 
tive consensus between sources of evidence (different visual 
cues) 2) calculate the total or partial conflicting masses; 3) 
redistribute the conflicting mass (total or partial) proportion- 
ally on non-empty sets involved in the model according to 
all integrity constraints. The recently proposed non-Bayesian 
probabilistic Proportional Conflict Redistribution rule no.5 
(pPCR5) [18] is based on the discrete Proportional Conflict 
Redistribution rule no.5 [18], Vol.3, for combining discrete 
basic belief assignments. For completeness, we will discuss 
in brief the main idea behind the discrete PCR5. It comes 
from the necessity to deal with both uncertain and conflicting 
information, transferring partial or total conflicting masses pro- 
portionally only to non-empty sets involved in the particular 
conflict and proportionally to their individual masses. Basic 
belief assignment (bba) represents the knowledge, provided 
by particular source of information about its belief in the true 
state of the problem under consideration. Given a frame of 
hypotheses O = {6},...,0,}, and the so called power set 
Of = {Q, A1, seey Ons A, Us, seey 0, U@2U .-Ubn}, on which the 
combination is defined, the general basic belief assignment is 
defined as a mapping m,(.) : 2° — [0,1], associated with 
the given source of information s, such that: m,(0@) = 0 
and }° yee ms(X) = 1. The quantity m,(X) represents the 
mass of belief exactly committed to X. Under Shafer’s model 
assumption of the frame © (requiring all the hypotheses to 
be exclusive and exhaustive), the PCRS combination rule for 
only two sources of information is defined as: mpcrs(0) = 0 
and VX € 2° \ {0} 


mpors(X) = mio(X)+ 


My (X)?mz2 (Y) 
dix) emav) t 


mo(X)?m1(Y) 


Ye2°\{x} 
XnY=0 

All sets involved in the formula are in canonical form. The 

quantity ™m2(X) corresponds to the conjunctive consensus, 

ie: M12(X) = Dy, x e290 ™M1(X1)mM2(X2). All denomina- 


X1NX2=X 
tors are different from zero. If a denominator is zero, that 
fraction is discarded. No matter how big or small the conflict- 
ing mass is, PCR5 mathematically does a proper redistribution 
of the conflicting mass. It is because PCR5 goes backwards 
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on the tracks of the conjunctive rule and redistributes the 
partial conflicting masses only to the sets involved in the 
conflict and proportionally to their masses put in the conflict, 
considering the conjunctive normal form of the partial conflict. 
PCRS is quasi-associative and preserves the neutral impact 
of the vacuous belief assignment. The probabilistic PCR5 
(pPCR5) is an extension of discrete PCRS version to its 
continuous probabilistic counterpart. Basic belief assignment, 
involved in discrete PCRS rule is extended to densities of 
probabilities of random variables. For two independent sources 
of information with given Gaussian distributions p,(x) and 
p2(x), the obtained combined result becomes [18]: 


Pi(t)p2(y) 
pi(x) + pa(y) 
p2(x)pi(y) 


pale) | eat " 


The behavior of pPCRS5 fusion rule in comparison to NCC 
rule (1) could be characterized by two cases below: 

Case 1: both densities p;(a) and po(x) are close (Fig.1- 
case 1). The combined density acts as an amplifier of the 
information by reducing the variance. Here pPCRS5 acts as 
NCC fusion rule. 

Case 2: the densities p;(x) and p2(x) are distant (Fig. 1-case 
2). Then the combined density keeps both original densities 
(not merging both densities into only one unimodal Gaussian 
density as NCC rule does), avoiding to neglect a part of the 
available information. 


PpPCR5(2) = pio) | 


cuel 


0.8 cue2 
(cuel+cue2)-NCC 
0.6 = (cuel+cue2)-PCR5 
——=CumulProba-PCR5 


== CumulProba-NCC 


0.2 


pdf / cumulated probability 


Case 1 Case 2 


Figure 1. Performance of pPCRS5 fusion rule vs. NCC rule. 


This new (from a theoretical point of view) property is very 
interesting and it presents advantages for practical applications 
as it will be shown in our particular research. Application of 
pPCRS5 fusion rule assures robustness to the potential errors 
and allows taking more reliable and adequate decisions in the 
process of integration of different cues in visual perception. 


III. EXPERIMENTS 
A. Stimuli 


The stimuli consisted of 50 dots. The dot patterns oc- 
cupied an area of 15 angular degrees. The stimuli were 


generated beforehand and contained 100 frames (except the 
static condition). Each frame lasted 33 msec. The lifetime 
of the dots was 3 frames, thus on every frame one-third of 
the dots were randomly re-positioned. For the motion and the 
combined condition the velocity of the dots was 4 degrees of 
arc/sec. The stimuli were radial patterns with a focus (center) 
positioned eccentrically to the middle of the screen. The center 
of the patterns defined by the orientation of the pairs or the 
trajectories of the dots could take 7 values to the left or to 
the right of the midpoint of the screen: 0.67 to 4.67 degrees 
of arc in steps of 0.67 degrees of arc. Ten different exemplars 
of patterns for each center and condition were generated. The 
dots subtended 0.2 degrees of arc. 


B. Experimental conditions 


Four different experimental conditions were performed: 


e Static (form) condition The experimental stimuli (Fig.2) 
consist of dots pairs separated by 2 degrees of arc. The 
orientation of the virtual lines connecting the dots in 
18 pairs intersected in a common point considered the 
center of the patterns, while the rest 7 pairs had random 
orientation. 

e Motion condition In this experiment (Fig.3) 36 points 
had trajectories that intersected at a common point, while 
the rest 14 dots had random trajectories. 

e Flicker condition In this condition (Fig.4) a sequence 
of random static patterns was presented. As in the static 
condition the orientation of 18 pairs of dots, separated 
by 2 degrees of arc pointed to a common center while 
the rest 7 pairs had different orientation. The sequential 
presentation of the static patterns created illusory motion, 
but the trajectories of the apparent motion were random. 

e Combined condition In this experiment (Fig.5) 18 pairs 
of dots moved along trajectories towards a common cen- 
ter. The orientation of these pairs was along the motion 
trajectory. The rest 7 pairs had random trajectories, but 
again, the orientation defined by the pairs was along the 
trajectory of motion. 


The figures 2-5 correspond to a single frame from the four 
experimental conditions. The four conditions of the experiment 
differ by the relative contribution and the order of temporal 
and spatial integration. In the static conditions the observers 
needed to find the correspondence of the dots to a pair and to 
globally integrate this information in order to find the focus 
of the radial pattern. In the flicker condition on every frame 
the observers had to integrate the spatial information from the 
pairs of dots but they could benefit from temporal integration 
of the sequential patterns that would be equivalent to the pres- 
ence of a larger number of dot pairs. In the motion conditions 
the observers had to temporally integrate the displacement 
of dots in the sequential frames in order to determine their 
trajectory of motion and to integrate this information in space 
to determine the focus of the radial pattern. In the combined 
condition the observers had redundant information as both the 
trajectory of dot motion and the orientation of the dot pairs 
provided similar information. 
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Figure 2. Static Condition. 


Figure 3. Motion Condition. 


C. Experimental Procedure 


The subject sat at 57 cm from the monitor screen. The 
stimuli were presented on a gray screen with mean luminance 
50 cd/m?. Each stimulus presentation was preceded by a 
warning signal. A red fixation point with size of 0.8 degrees 
of arc appeared in the center of the screen for 500 msec. The 
stimuli were presented simultaneously with the disappearance 
of the fixation point. The Subjects performed a single-stimulus 
two-alternative force choice task. They had to continue looking 
at the position where the fixation point was presented until 
making a decision where the center of the pattern was (left or 
right relative to the fixation point). At this moment the subject 
had to move his/her eyes towards the position of the perceived 
center and to press the left or the right mouse button depending 
on whether the perceived center appeared to the left or to the 
right from the fixation point. If the subject could not make a 
decision during the 3.3 sec of the stimulus presentation (100 
frames), the stimulus disappeared and the screen remained 
gray until the subject made a response. 


D. Method 


The method of constant stimuli was used. Each condition 
was presented in a separate block consisting of 10 presenta- 
tions for each position of the pattern center (a total of 140 
presentations, 7 positions for a center shifted to the left and 


Figure 4. Flicker Condition. 


Figure 5. Combined Condition. 


7 positions for a center shifted to the right). The order of 
stimulus presentation was random. Each Subject took part 
in at least two experiments with 4 blocks for each of the 
4 experimental conditions. All conditions were presented in 
a random order in a single day. The duration of each block 
depended on the subject performance, but the experiment did 
not exceed | hour. The eye movements of the subjects were 
registered with Jazz-novo multisensor measurement system 
(Ober Consulting Sp. z 0.0) [20]. 


E. Apparatus 


The stimuli were presented on a 20.1 inch NEC MultiSync 
LCD monitor with NvidiaQuadro 900XGL graphic board at 
a refresh rate of 60 Hz and screen resolution 1280/1024 
pixels. The experiments were controlled by a custom program 
developed under Visual C++ and OpenGl. 


F. Subjects 


The subjects participating in the experiments are divided in 
three age groups: young (aged from 20 to 34 years), middle 
(aged 35 to 55 years) and elderly (aged 57 to 84 years). They 
did not have a whole training session, but they were given 
examples of stimuli to check whether they understood the task 
and to get an idea of the stimuli in a given condition. 
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IV. PERFORMANCE EVALUATION OF AGE-RELATED 
OBSERVERS GROUPS 


The experimental goal of our study is directed to 
characterize the human heading perception influenced by: 
(i) form information only (ii) motion information only 
(iii) flicker information, i.e temporal integration of form 
information (iv) combined form and motion information. 
The question is if people rely and base their responses on 
a single source of information, or on combined one, and 
also which type of information utilized is more informative 
in the decision process. The participants belong to three 
age groups: Young, Middle aged, and Old. Hence, also 
the influence of human age on the assessment of heading 
perception will be evaluated. The evaluation is made on the 
base of experimental psychometric functions, obtained for 
all different experimental conditions and for each subject in 
all age-contingent groups. The psychometric function reflects 
the dependence between a given physical quantity (in this 
case, the pattern shift from the middle of the screen) and the 
proportion of subjectsA responses of a given type, in our 
case A the proportion of responses Athe pattern center is to 
the right”. 


e Evaluation of heading perception in Young observers 
group 


The comparison of the performance in the static, mo- 
tion and flicker conditions show that in Young group 
only 2 out of 10 observers have best performance for 
the static condition, 4 observers effectively utilized the 
motion information showing best performance in this 
case, and 4 out of 10 observers show best performance 
in the flicker condition. For 4 out of 10 observers the 
null hypothesis of equal psychometric functions for both 
motion and flicker information could not be rejected, 
i.e they could be considered as equivalent. These results 
suggest that the young observers effectively integrate the 
available information in time. The contribution of the 
information available in each of these three conditions to 
the performance of the combined condition differs. Only 
1 out of 10 subjects relies mainly on motion, | - on 
the information available in the flicker condition, while 7 
out 10 combined effectively the independent sources of 
information available in the static and motion condition. 
The performance of averaged (on the base of 10 subjects 
in the group) young subject is shown on Fig.6. 

For the averaged young subject the psychometric curves 
associated with static, motion, and flicker information 
are not distant and the null hypothesis that they do not 
differ could not be rejected. 


e Evaluation of heading perception in Middle aged 
observers group 


In this age-related group only | out of 6 subjects shows 
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Psychometric Curves of Averaged Young Subject 
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Figure 6. Psychometric Curves of Averaged Young Subject. 


better performance in the static condition and | out 
of 6 observers - in the flicker condition. For | out of 
6 observers the null hypotheses of equal psychometric 
functions for both motion and static information could not 
be rejected. For 4 out of 6 observers the null hypothesis 
for equal psychometric functions for motion and flicker 
conditions could not be rejected too. As general, the 
results suggest a small effect of the static information. 
The results for 4 out of 6 observers show that the 
results in the combined condition could be successfully 
predicted based on the performance of the static and 
motion conditions. The performance of averaged middle 
aged subject is shown on Fig.7. 
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Figure 7. Psychometric Curves of Averaged Middle aged Subject. 


The averaged middle aged observer does not rely mainly 
on the static information. For him the combined and 
flicker condition do not differ significantly. 


Evaluation of heading perception in Old observers 
group 


The obtained results in Old-age group show that 3 out 
of 10 observers show best performance in the static, 3 
out of 10 - in motion, and 4 out of 10 in the flicker 
condition. The null hypothesis for equal psychometric 
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functions is valid for: | out of 10 for motion and static 
condition, and for 2 out of 10 - for motion and flicker 
condition. Six out of 10 subjects utilize combined static 
and motion information to make their final decision in 
the combined condition. The performance of averaged 
old subject is shown on Fig.8. For averaged old subject 
the null hypotheses that the static and flicker cases do not 
differ is valid. The averaged old observer relies more on 
motion information. 
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Figure 8. Psychometric Curves of Averaged Old Subject. 


V. PPCR5 AND NCC RULES PERFORMANCE FOR 
PREDICTING HUMAN’S WAY OF FORM AND MOTION 
COMBINATION 


The main question here is which fusion rule - pPCR5 or 
NCC used to combine available static and motion information 
predicts more adequately human cue integration? In order 
to answer this question we need to make a comparison be- 
tween experimentally obtained and predicted (via pPCRS5 and 
NCC rules) psychometric functions for combined condition 
(static and motion), for the three age contingent groups. This 
comparison is provided on the base of goodness-of-fit es 
[19], one important a of chi-squared criteria: 7 
ae 1 (0, Fue where 7 is an index of the agreement be- 
tween an observed(O)/experimental and expected(E)/predicted 
via particular fusion rule sample values of psychometric func- 
tion. For our case J = 14 represents the number of pattern’s 
shifts from the middle of the screen. The critical value of the 
test for v = J—1 = 13 degrees of freedom at assumed p = 0.1 
is x? = 19.81 [19]. The respective results are given in Table 
I - for young group, in Table II - for middle aged group, and 
in Table III - for old persons’ group. 

In general, the results show that the pPCR5 fusion rule 
predicts more adequately than NCC rule human performance 
for the three age groups. 

For young and for middle aged persons (Tables I and 
II) both fusion rules predict psychometric functions that do 
not differ significantly from the experimental ones, but the 
differences in the fits are smaller in case of pPCRS5 rule than 
in case of NCC rule application. The same findings are valid 
for old people (Table III), but in this group NCC rule show 


Table I 
CHI-SQUARED VALUES FOR YOUNG SUBJECTS. 


(Form and Motion) pPCRS5 | (Form and Motion) NCC 
0. 1.8482 


8587 
0.4801 


0.3045 
0.1509 
0.1655 


0.3342 
0.0912 
0.5103 
0.1943 
0.0913 
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Table II 
CHI-SQUARED VALUES FOR MIDDLE AGED SUBJECTS. 


(Form and Motion) pPCR5 | (Form and Motion) NCC 
0.3698 0.9854 


worse performance for subject no.4 (put in bold in Table HT) 
showing the exceeded critical value of y? = 19.81. The reason 
for this result reflects the situations, when the experimentally 
obtained psychometric functions, associated with single static 
and single motion conditions are characterized with distant 
underlying Gaussian distributions. In this case pPCR5 makes 
prediction, which models more correctly and adequately hu- 
man combination behavior. Using NCC rule however, part of 
available information has been neglected, because the cue with 
bigger mean was weighted more heavily than the cue with a 
smaller one (as it was described in Section I). 


VI. COMMON TRENDS OF AGE RELATED OBSERVER 
GROUPS 


The goal here is to find the common trend, concerning the 
performance of the three groups. In order to achieve it, we 
consider each group as a set of different sources of evidence, 
associated with each person in the group. That way young 
group consists of 10 (middle aged of 6, old aged of 10) sources 
(subjects) of evidence, which should be combined all together 
via pPCR5 and NCC fusion rules. 

The combined individual behaviors in particular group are 
estimated, reveling its intrinsic behavior as a whole, reducing 
uncertainties associated with individual performances. All the 
tested subjects in age groups are considered as independent 
and equally reliable sources of information, because each 
subject provides his/her own psychometric function, associated 
with the static and motion condition and should be taken into 
account with equal weights to derive these trends. 

Our goal is to find out which combinational rule (pPCRS5 or 
NCC) is able to model correctly and adequately such human 
age-contingent group trends in the process of decision making. 
The results obtained for experimental and estimated (via the 
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Table II 
CHI-SQUARED VALUES FOR OLD SUBJECTS. 


(Form and Motion) pPCR5 | (Form and Motion) NCC 
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fusion rules) trends, concerning the cues combination groups’ 
performance are presented in Figures 9, 10, and 11. 
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Figure 9. Trends of Young Subjects Group. 
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Figure 10. Trends of Middle aged Subjects Group. 


In order to compare the performance of both fusion rules 
in estimating common trends’ prediction the city-block errors 
between the corresponding triples young/middle/old group ex- 
perimental form and motion combination) - young/middle/old 
group estimated (via pPCR5 and NCC) form and motion 
combination are given in Table IV. Results show ultimately 
that experimentally obtained and those, based on pPCR5 
fusion rule are closer and for the three age-contingent groups 
are more than two times less than those, obtained via NCC 
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Figure 11. Trends of Old Subjects Group. 


fusion rule. pPCRS5 fusion rule predicts more correctly the 
human model of decision making, than NCC rule, utilizing all 
the available information (Form and Motion), even in case of 
conflict. NCC based trends are very sensitive to the sources 
(different subjects’ psychometric functions) with the bigger 
means, neglecting that way part of the available information 
and acting as an amplifier of the information by reducing the 
variances. 


Table IV 
CITY BLOCK ERRORS BETWEEN EXPERIMENTAL AND PREDICTED TRENDS. 
[PRS 
FM Young 0.03 0.10 
FM Middle | 0.06 0.13 
FM Old 0.04 0.12 


VII. CONCLUSIONS 


This paper presented a study on human heading percep- 
tion obtained on the base of motion and form visual cues 
integration. The influence of human age on this process was 
evaluated. The results obtained show age-related difference 
in the performance of the subjects in estimating the heading 
direction based on the combined static (form) and motion 
information. 

Our experimentally obtained data for young observers sug- 
gest smaller effect of the static information case and provides 
indirect evidence that their performance is based more on the 
temporal integration of information in the motion and flicker 
conditions. The experimental results for Middle-aged group 
suggest less effect of the static information and an effect of 
the order of temporal and spatial integration. The old subjects 
used to rely more on the motion information. All age-related 
groups rely on combined (motion and form) information to 
take their final decisions for heading perception. 

A comparison between experimentally obtained and pre- 
dicted (via pPCR5 and NCC rules) psychometric functions 
for combined condition (static and motion), for the three age 
contingent groups was made and estimated on the base of 
goodness-of-fit test, one important application of chi-squared 
criteria. Results proved that pPCR5 makes prediction, which 
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models more correctly and adequately human combination 
behavior than NCC, especially in cases of conflicts between 
the different visual cues. 

The combined individual behaviors (the trends) in particular 
age groups were estimated, reveling its intrinsic behavior 
as a whole, reducing uncertainties associated with individual 
observersA performance. Results show ultimately that pPCR5 
fusion rule, utilizing all the available information - static 
(form) and motion, even in case of conflict, predicts more 
correctly the human model of decision making, than NCC rule. 
That way pPCRS5 fusion rule assures preserving the richness 
of cues data in the process of visual stimuli combination and 
assures improvement of decision accuracy. pPCR5 describes 
better the characteristics of the different age groups in decision 
making based on the motion and form information in heading 
perception. 
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Abstract—The extraction of building changes from very high 
resolution satellite images is an important but challenging task 
in remote sensing. Digital surface models (DSMs) generated from 
stereo imagery have proved to be valuable additional data sources 
for this task. In order to efficiently use the change information 
from the DSMs and spectral images, belief functions have been 
introduced. In this article, two-step building change detection 
fusion models based on both Dempster-shafer theory (DST) and 
Dezert-Smarandache Theory (DSmT) frameworks are proposed. 
In the first step, basic belief assignments (BBAs) of the change 
indicators from images and DSMs are calculated by using a 
refined sigmoidal BBA model. Then these BBAs are employed 
for the new proposed building change detection decision fusion 
approach. In order to cover the miss-detections introduced by 
the wrong height values of the DSMs and incomplete information 
from images, disparity maps from the DSM generation procedure 
and shadow maps from the multispectral channels are adopted 
to generate reliability maps, which are further integrated to 
the fusion models. In the last step, building change masks 
are generated based on four decision-making criteria. In the 
experimental part of this work, we evaluate the performance of 
this new building change detection method on real satellite images 
thanks to a building change reference mask representing the 
ground truth. Substantial accuracy improvements are achieved 
when comparing the new results with those obtained from 
classical 3D change detection approaches. 


Keywords: change detection, belief functions, DSmT, DST, 
DSM. 


I. INTRODUCTION 


Efficient and accurate detection of building changes using 
remote sensing data is of great importance for urban monitor- 
ing and disaster monitoring. It is one of the fundamental tasks 
in remote sensing and is attracting more interests due to the 
high and accelerated rate of urban growing and more frequent 
natural disasters with climate changes. 

In the last decades, 2D change detection methods on large 
scale land cover monitoring have been extensively studied 
and applied on satellite images [1], [2], [3]. There are many 
excellent approaches available which can extract landcover 
changes from multi-temporal images [4], [5]. However, high- 
lighting only building changes in urban area remains difficult 
due to the mixture of other background changes, for instance 
the changes introduced by different illumination conditions or 
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human activities. The influence of these changes is growing as 
higher resolution images show more details of the landcover 
objects. In addition, even with very high resolution data it is 
sometimes impossible to distinguish buildings and roads using 
simple spectral change. 

Therefore, height information derived from Digital Sur- 
face Model (DSM) is posing new possibilities for building 
change detection. Benefiting from improved data quality and 
advanced computer vision techniques, the accuracy of the 
DSMs from satellite stereo imagery has been largely improved 
and enables building change detection in a larger region and 
with high frequency. However, the DSMs may exhibit some 
inaccurate height values resulting from failed matching and 
occlusions within the stereo and multiple views. Thus the 
fusing of changes from multispectral image and DSMs would 
be an effective solution for building change detection. The 
comparison of DSMs can locate the changes of high-level 
objects efficiently and robustly and the spectral images have 
rich spectral and texture feathers which can highlight more 
changes among the multi-temporal datasets. On the other hand, 
as the DSMs have been generated from the multispectral 
data, there is no time difference between them. The 2D and 
3D information can be combined through post-refinement, 
region-based approaches or decision fusion [6]. In more recent 
researches, DSMs from multi-sensors and time-series data 
were involved [7], [8]. 

Regarding to feature fusion due to the diverse building 
characteristics and background information, the urban building 
monitoring approaches may perform variedly for different test 
regions. Thus recently some researches are trying to combine 
different change features and change classification methods, 
and fuse the results with a decision model. For instance, [9] 
have proposed a probabilistic framework to fuse the results 
from four local feature vectors for building detection. Based 
on an adaptive network-based fuzzy inference system, [10] 
have fused the change detection results from different feature 
combinations. Besides fusing the detection result, decision 
fusion can be also directly used for classification and change 
detection [11]-[15]. 

Thus until now there is no decision fusion model that 
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directly takes the change indices from images and height 
maps for building change detection. In our previous research 
works [16], belief functions have performed very well for 3D 
building change detection. As aforementioned, the accuracy of 
2D change detection of specific objects is limited due to the 
misdetections caused by irrelevant changes. These irrelevant 
changes have a larger effect on very high resolution (VHR) 
images than on low and moderately high resolution images, 
since in VHR images their higher details are more sensitive 
to viewing and solar angle differences. DSMs generated from 
satellite stereo imagery can largely help to solve this problem. 
Unfortunately, the fusion model proposed in [16] is rather ba- 
sic, and it is not robust in dealing with high conflict situations. 
Therefore, the belief functions have been further investigated 
and improved in this article. Besides Dempster-Shafer Theory 
(DST) [17], [18], an extended Dezert-Smarandache Theory 
(DSmT) [19] will be adopted in this article to generate the 
building change detection models. One of the difficulties of 
using Dempster-Shafer theory is the definition of uncertainty 
and the calculation of the basic belief assignments (BBAs). 
[16] used one sigmoid function to distribute the values of one 
change feature to the BBAs ranging from 0 to 1. The symmetry 
point which indicates a certainty of 50% was automatically 
calculated with a thresholding method. However, the accuracy 
and robustness of the thresholding approach will directly 
influence the correctness of the obtained BBAs. Thus, as well 
as the fusion models, the BBAs construction approach should 
be updated to further improve the change detection result. 
These problems have been well addressed in our modified 
approach [20]. In addition, the uncertainty of change indicators 
was measured in order to improve the accuracy of BBAs. 
Due to space limitation constraint of conference paper format, 
the methodology part has been only shortly described in [20] 
and only small patches have been tested in the experimental 
part. A better description of this methodology with more 
experiments of our approach is presented in this article with 
the improvement of the reliability discounting approach. 


Focusing on building change detection by fusing spectral 
and height information extracted from satellite stereo imagery, 
this article is organised as follows. First, the belief functions 
of DST and DSmT are briefly reviewed. Then, the building 
change models are proposed for these theoretical frameworks. 
The belief functions are used in both BBAs preparation 
and change detection procedure. Two sigmoid functions are 
simulated for each change feature to obtain the BBAs. In order 
to further improve the BBAs values reliability discounting 
techniques are presented. We use the unfilled disparity map 
and shadow maps to generate the reliability map of the changes 
from the height and 2D images, respectively. The reliability 
maps are then used in the fusion process to refine the initial 
BBAs. We generate four sets of global BBAs. With four 
decision criteria the final change detection masks can be 
generated. In the end, these refined fusion models are tested 
on four sets of real satellite images, and a comprehensive 
comparison is included to validate the new approaches. 


IJ. BELIEF FUNCTIONS, DST AND DSMT 
A. Basics of belief functions 


The details of DST and DSmT have been presented by 
[18], [19] and [21]. Let © be a frame of discernment of a 
problem under consideration. 0 = {61,62,...,4n} consists 
of a list of N exhaustive and mutually exclusive elements 6;, 
i=1,2,...,N. Each 6; represents a possible state related to 
the problem we want to solve. The assumption of exhaustivity 
and mutual exclusivity of elements of O is classically referred 
as Shafer’s model of the frame ©. A BBA also called a 
belief mass function (or just a mass for short), is a mapping 
m(.) : 2° —+ [0,1] from the power set! of © (denoted 2°) to 
[0, 1], that verifies [18]: 

S> m(X) =1. 


m(0) =0, and 
X€2° 


() 


m(X) represents the mass of belief exactly committed to X. 
An element X € 2° is called a focal element if and only 
if m(X) > 0. The belief and plausibility functions based on 
DST theory are defined respectively as: 


Bel(A)= > m(B), (2) 
Be2°, BCA 
PU(A)= YS m(B). (3) 


Be2® ,BnAZd 


In DST, the combination (fusion) of several independent 
sources of evidences is done with Dempster-Shafer” (DS) rule, 
assuming that the sources are not in total conflict?. DS com- 
bination of two independent BBAs m,(.) and mo(.), denoted 
symbolically by DS(m1, mz), is defined by m?°(0) = 0, and 
for all X € 2° \ {0} by: 


1 
DS = 
mo"(X)= T- KDS S- See), (4) 
X1,X2E2 
X{NX_=X 
where the total degree of conflict K?* is given by 
KPSS SN*  ma(X1)me(X2). (5) 
X1,X2€2° 
X1NX2=0 


A discussion on the validity of DS rule and its incompatibil- 
ity with Bayes fusion rule for combining Bayesian BBAs can 
be found in the literature [21], [22], [23]. To circumvent the 
problems of DS rule, Smarandache and Dezert ([{19], Vol. 2, 
Chap. 1), then Martin and Osswald ([19], Vol. 2, Chap. 2) have 
developed in DSmT [19] two fusion rules called PCR5 and 
PCR6 based on the proportional conflict redistribution (PCR) 
principle which consists 


'The power set is the set of all subsets of ©, empty set included. 

Although the rule has been proposed originally by Dempster, we call it 
Dempster-Shafer rule as it has been widely promoted by Shafer in DST [18]. 

3otherwise DS rule is mathematically not defined because of 0/0 indeter- 
minacy. 
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1) apply the conjunctive rule 

2) calculate the total or partial conflicting masses 

3) then redistribute the (total or partial) conflicting mass 

proportionally on non-empty sets involved in the conflict 
according to the integrity constraints one has for the 
frame O. 

This PCR principle transfers the conflicting mass only to the 
elements involved in the conflict and proportionally to their 
individual masses, so that the specificity of the information 
is not degraded. Because the proportional transfer can be 
done in different ways, this has yielded to several different 
fusion rules. It has been proved by [24] that only PCR6 
tule is compatible with frequentist probability estimation, and 
that is why we recommend its use in the applications. PCR5 
and PCR6 rules simplify greatly and their formulas coincide 
for the combination of two sources. In this case, the PCR6 
combination is obtained by taking m’°"°() = 0, and for all 
X #Q in 2° by 


mPCREX) = ST mm (X1)m2(X2)+ 
ma (X)Pma(¥) mo(X)?m1(Y) 
| ], (6) 
veo mi(X) + m2(¥) — m2(X) +mi(Y) 
XNY=0 


where all denominators in Eq. (6) are different from zero. If 
a denominator is zero, that fraction is discarded. 

If a denominator, e.g., m(X) + m2(Y) tends towards 0, 
then also the conflicting mass m,(X)m2(Y) that is transfer- 
able tends to zero because m,(X) and m2(Y) tend to zero 
(since they are positive); therefore, the redistribution of masses 
also tends to zero. That reflects the continuity of PCR6. 


B. Reliability discounting 


The reliability discounting has been described and discussed 
in the references [25], [26]. Briefly, if an additional knowledge 
about the reliability (a) of certain source of evidence is 
available, it can be adopted to refine the initial BBAs. For 
instance the height change and image change indicators may 
not perform well under some situations. This situation can be 
measured, and used as reliability factors. Each factor a@ would 
be a value ranging from 0 to 1. And a = 1 means fully 
reliable, while a = 0 means the indicator is totally unreliable. 
And all the remaining discounted mass are transferred to the 
full ignorance ©. Based on Shafer’s discounting model [18], 
the reliability discounting factor @ is introduced to discount 
any BBA m/(.) defined on the power set 2° as follows 
WX € 2°: 


Mea(O) = a-m(O) + (1- a). ”) 


fa =a-m(X),for X £0, 
III. BUILDING CHANGE DETECTION FUSION MODEL 
A. Choice of the frame of discernment 


Focusing on change detection, as a data preparation step, 
DSMs are calculated from satellite stereo imagery based on 


semi-global matching approach [27], [28]. It follows two main 
steps. First, the epipolar image pair is generated through a 
pyramidal local least squares matching. Then the matching is 
cast into dynamic programming to minimise the cost function. 
We use census feature to measure the similarity between two 
pixels [27]. The challenges and opportunities of the DSMs 
assisted building change detection have been well described 
in [16]. The geo-information is employed to co-register these 
data, which enables a sub-pixel accuracy. Focusing on building 
change detection, two change indicators, one from images 
and one from DSMs are extracted. Changes from spectral 
images are highlighted by using the Iteratively Reweighted 
Multivariate Alteration Detection IRMAD) [5]. Consequently, 
height changes from DSMs are shown after robust height 
difference [29], [16]. We suppose that new, demolished or 
rebuilt buildings may exhibit both height and spectral changes. 
But the spectral changes can also be introduced by seasonal 
changes and other irrelevant changes. After excluding building 
changes, changed pixels exclude building regions are named 
here as OtherChange. Therefore, three classes are considered 
to define the frame of discernment satisfying Shafer’s model 
(i.e. the elements of the frame of discernment are disjoint): 


© = {6, = Pixel € BuildingChange, 
62 = Pixel € OtherChange, (8) 
63 = Pixel € NoChange}, 


and 
011 02N 63 = 0. (9) 


In image domain, each pixel represents a single sample, thus 
in Eq. (8), we have directly used the word ’Pixel’. Based on 
the three exclusive classes, the set of potential focal elements 
FE that enter in our application is: 


FE = {61, 60, 03, 01 U0, 01 U 03, 02 U3, 01 U82U 63}. (10) 


It is worth noting that even if we work with Shafer’s model 
of the frame of discernment for this application (which is the 
basis of DST), we can also use PCR6 rule developed in DSmT 
because PCR6 works also with Shafer’s model as shown in 
[19]. 

The whole procedure of the proposed building change 
detection model is shown in Fig. 1. After the changes from 
DSMs and images are extracted, they will be reprojected using 
the sigmoid function to calculate the concordance index a and 
discordance index b. Then the decision fusion rules will be 
performed to generate the BBAs for height change and image 
change, respectively. After that, global BBAs can be calculated 
by using both DST and DSmT fusion rules. Finally, change 
mask can be obtained with various decision-making criteria. 


B. BBAs construction for building change detection 


In [30] a sigmoidal model for both concordance and dis- 
cordance indexes has been briefly presented. The details and 
advantages of this approach are described in [31]. The con- 
cordance index measures the concordance of change indicator 
and BBA in the assertion, while the discordance measures the 
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Sigmoid jaan | 
reprojection ES 
Sigmoid 
reprojection c= 


Ortho- 
Images 


Baimg 


Fig. 1. Workflow of the proposed method. 


opposition of change indicator to the BBAs in the assertion. 

In our previous works [16], the BBAs were built based on 

sigmoid curves related with the concordance index only. As 

explained in [16], the original sigmoid curve is defined as 
o-T 


fir,r) (2) = 0.99/(1 + e7 - )s (11) 


where z is the original value of each indicator (AH, Alma), 
where AH means the change in the height and AJmg means 
the change between two spectral images at a given pixel 
location. Two parameters T’ and 7 are used to control the 
symmetry point and the slope of the sigmoid function. The 
symmetry point indicates a certainty of 50%. In this article, 
we improve our model to construct the BBAs thanks to sig- 
moidal models for both concordance and discordance indexes 
following the idea proposed by [31]. The concordance index 
is similar as the indicator of our previous research. The green 
line in Fig. 2 shows an example of the concordance index 
from height changes. A higher height change indicator leads 
to a higher probability to be building change. The discordance 
index is defined as an indication for the opposite argument. 
The discordance index in Fig. 2 is shown in red color, which 
means that a higher height change reflects a lower probability 
to be not building change. The blue curve shows the conflict 
between the concordance and the discordance index. Both 
concordance index and discordance index are projected to the 
sigmoid curve distribution characterised by parameters T’ and 
Ts 


Ll i rrr 
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Fig. 2. Concordance and discordance index. 


In [31] these two parameters 7’ and 7 were manually se- 
lected. Here, as an improvement the multi-level Otsu’s thresh- 


olding method [32], [33] is used for automatically getting the 
symmetry points for both concordance index and discordance 
index. Otsu’s algorithm assumes that an image is composed of 
objects and background. A discriminant analysis is performed 
by minimising the intra-class variance. When three classes are 
of interest, two thresholds T; and T> are expected, and Otsu’s 
method can be extended to 


ein; T2) = wo? (T1, T2) 
+ we03(T1,T2) + w303(T1,T2). (12) 


The weights w; are the probabilities obtained from the image 
histogram that are separated by the thresholds 7) and 79. oj 
is the standard deviation of the i-th class, for 7 = 1,2,3. T} 
and JT, can be used as the symmetry points of discordance 
and concordance index, respectively. Thus, using the height 
change index as in the example, the BBAs for concordance 
and discordance height change index are functions of values 
aay and bay defined by 

aan =fr7,(AH), and bax = f_77,(A4H). (13) 

The discordance index can be considered as a reflection of 
the concordance index along the mirror line. Therefore, they 
are sharing the same 7. Here, the factor 7 is calculated with a 
sample value (AH = 1, aay = 0.1), which means 1 m height 
change indicates 10% probability to be building changes. The 
BBAs for discordance and concordance image change index 
are built similarly. Differences appearing in 2D images give 
a concordance indication for all changes, which include the 
building changes or other changes (6; U 62). In this article, 
the changes from images are named Almg. 

In the Tables I and II, we present the two ways of construc- 
tion of the BBAs from the sources of evidence based either 
on DS or on PCR6 rules of combination for the height change 
indicator (i.e. the first source of evidence) and the image 
change indicator (i.e. the second source of evidence). It has 
to be noted that 6; U 43 is not mentioned in the fusion model, 
as they do not share similar characters within the used feature 
space. In Table I, m1(.) and m/‘(.) represent the concordance 
and discordance BBAs from AH, whereas in Table II m(.) 
and m4(.) represent the concordance and discordance BBAs 
from images. [77 is the total conflicting mass value between 
my(.) and mj(.), and KA rmg in Table II is the total conflicting 
mass value between mg2(.) and m4(.), 
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TABLE I 
BBA CONSTRUCTION FOR HEIGHT CHANGE INDICATOR AH. [CONFLICT: Kay = aanbar] 


Focal Elem. my1(.) m4 (.) mPS(.) mPCR6() 
1—b 
01 aan sanC— aH) aan (1— ban) + Zoe R ae 
02 0 0 0 0 
03 0 0 0 0 
01 U 82 0 0 0 0 
mut ban FR ban + SREB 
01 U 82 U 63 l—aay 1—bay Gnéan}i—tan) (1 —aay)(1— bax) 
TABLE II 


BBA CONSTRUCTION FOR IMAGE CHANGE INDICATOR AImg. [CONFLICT: KArmg = @AImgbAImg] 


Focal Elem. ma(.) m(.) mPS(.) mF CRE) 
1 0 0 0 0 
O2 0 0 0 0 
G=aarmg)batm 
8 0 ba 7 g l-—aa ba + 
7 mee 1-Karmg ae wi 
; : 2AImgt+baImg 
6, U6 7 0 @Atmg—barmg) a 1—b +4 
@Atmgtbarmg 
62 U 63 0 0 0 0 
A A A b G—aarmg)A-barmg) b 
1U 82 U 43 1—aarmg 1— AImg 1=Katma (1 @AImg)(1— AImg) 
C. Reliability discounting 7 
In the DSM assisted building change detection, false alarms = ® 
arise if wrong heights are presenting in the DSM for large : “a 


regions [16]. And these wrong heights are mostly introduced 
not in the stereoscopic images matching procedure, but in 
the gaps filling step. In the last step of the DSM generation 
procedure, the height of un-matched pixels is interpolated 
using the height values of neighbourhood pixels. Normally 
a reliable height value can be achieved for small gaps. But 
when large gaps appear in the disparity map, for example, 
for a whole building roof, the height of that building can not 
be correctly interpolated. Thus, the percentage of available 
successfully matched pixels inside a predefined neighbourhood 
region can be used to generate the height reliability. Fig. 3 
shows an example of the generated reliability map. Fig. 3a 
is the gap mask. The gaps region of the disparity map is 
represented with black colour. Pixels with proper elevation 
values are displayed with white colour. It can be observed, 
based on our approach that pixels in the centre of a gap 
get lower reliability factor values than pixels next to the gap 
boundary (see Fig. 3b). 

In the building change detection procedure, the reliability 
map of two DSMs (apg and apg) are calculated, 
respectively. They are then fused together to generate a final 
reliability map a,v for the height change mass. 


QAH = @psm1i° @psm2- (14) 


Shadow has played an important role when analysing very 


(b) 


(a) 


Fig. 3. Reliability map (b) generated from the gaps mask (a). 


high resolution images in urban region. Both of the changes 
of shadow and coverage of shadow will bring false alarms for 
change detection. Therefore, the 2D changes that are detected 
in shadow regions are less reliable than in non-shadow regions. 
Benefit to this character, we can adopt the shadow map as the 
reliability map of BBA from the image change indicator. For 
this purpose, first a shadow map is generated by calculating 
the average brightness of the multi-spectral image, as normally 
a dark colour indicates the existence of shadows. We take an 
easy and fast shadow detection approach as shown in Eq. (15) 
to highlight the shadow class. It is a pixel-based approach, 
therefore, By, in Eq. (15) represents the intensity values at 
one pixel location in different multi-spectral band images. 
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And n is the number of the multispectral bands. The detected 
shadow map from brightness is enough for our purpose. In 
this shadow map, a smaller value indicates higher probability 
to be shadows; thus, the 2D changes detected in these regions 
are less reliable. 


1 n 

Brightness = — Bx. (15) 

A further process is proposed to obtain a valid reliability 
map from the shadow map. First, it has been projected to a sig- 
moid curve. The lower threshold value from the two-level Ostu 
threshold is used as the symmetry point of the sigmoid curve. 
The obtained probability map is denoted as Shadow Map. In 
order to control the influence of the ShadowMap, we have 
only kept the values less than 0.5. 


(16) 


An = 0.5 + I shadow Map; if I shadow Map < 0.5, 
oan 1, otherwise. 


where IsnadowMap is the pixel intensity of the shadow map 
in [0, 1]. The reliability map generated from the shadow map 
is then recorded as @Aymg, and it is the combination of the 
shadow maps of two dates. 


(17) 


AAImg = img * Cimg2- 
D. Global BBAs 


The BBAs related with the concordance and discordance 
indexes are combined to get the global BBA regarding each 
source of evidence. These global BBAs will then be used 
as input for solving the change detection problem thanks to 
their combination. From the previous step of BBAs modelling, 
each pixel will get two sets of BBAs to combine results 
from Table I and II. More precisely, we will have to combine 
either {m?°(.),m5(.)} if DS rule is preferred for the BBA 
modeling, or {mPC*®(.),mFCR°(.)} if the PCR6 rule is 
adopted. These BBAs from Table I and II are represented by 
@1,@2,a3 and b,,b2,b3. In this article, the mass values ay, 
ag and az are further discounted by the generated reliability 
map @ay and denoted respectively as A;, Ap and A3. The 
mass values from the image change indicator b,, bz and b3 are 
discounted by the vegetation and shadow indicators @Armg 
obtained in formula Eq. (17) to B,, Bz and Bs. 

More precisely, one computes 


A, = QAH ‘4%, 


Az = AaH - 42, (18) 
A3 = aay: a3 + (1— aag). 

By = AAImg* b1, 

Bz = AAImg* bo, (19) 


Bs = AAImg* bs =F (1 7 OA Ting): 


Table III and Table IV describe the final building change 
detection models based either on DS or on PCR6 rules. Here, 
the discounted height change indicator is denoted as m1, ;,(.), 


and the discounted image change indicator is denoted as 


M20 1ng()- 


TABLE III 
DS FUSION MODEL FOR BUILDING CHANGE DETECTION. 


Focal Elem. Magy ()  M2aarmg mPS(.) 
a1 At 0 Av B14 Bs) 
05 0 0 ae 
63 0 Bo Gates 

91 U8, 0 By aatt 

62 U 63 Ag 0 Aa. 3 ; 

8 As Bs A 
TABLE IV 


PCR6 FUSION MODEL FOR BUILDING CHANGE DETECTION. 


Foc. Elem. = mia, 77 (.) MA ting) mPLR6() 
O41 Ai 0 Ai (Bi + Bs) + Ar Be 
02 (0) 0 Aee 
03 0 Bo (Az + Az) Ba Een 
01 U 62 0 By A3B 
02 U 03 Ag 0 Ao B3 
iz As Bs A3B3 


Me, 4 (.) can be obtained from the discounting of the fusion 
results presented in Table I. Thus they have been denoted 
respectively as m3. (.) and mfC"6(.). These discounted 
height change indicators are fused in the second step with 
the image change indicator m2q,,,,,,(-) to generate the final 
global BBAs. From the Tables II and IV, four sets of global 
BBAs can be computed based on different BBAs and fusion 
models. The flow diagram in Fig. 4 summarises the different 
fusion schemes tested in our application. 

As one sees, if both the BBA modelling procedure and 
global BBAs are constructed based on DS fusion rule, the 
generated global BBA is recorded as Gj. If the global BBAs 
are constructed based on PCR6 fusion rule, they are recorded 
as Gy. The basic BBAs can also be calculated with PCR6 
fusion rule, as shown in Table II. Based on these BBAs, the 
global BBAs can be also constructed using DS theory G3 
and PCR6 rule G4. It has to be mentioned that these four 
fusion schemes have different computational cost and Gj is 
the simplest one and G4 is the most expensive one in terms 
of computational burden. 


E. Change mask generation 


The final building change mask is our decision-making 
procedure. After the second step of fusion, each pixel in the 
images will get a certain degree of belief for all focal elements. 
The value of global BBAs in 6; gives a direct building change 
probability map. A decision criterion is required in generating 


268 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Mea (+) 


Me rimg (+) 


PCR6 rule 


Minag () 
mans (.) PCR6 rule 


Gi = DS{mPS__(.),mBS___(.)} 


2AAimg 


Go = PCRO{ MIE, (+): Merimg (-)} 


AAimg 


Gs = DS {mi QAP (-)s Moaarng (-)} 


G4 = PCR6{mPCF6(,), mPCR6 (.)} 


lean 2AAimg 


Fig. 4. Four fusion schemes based DS and PCR6 rules. 


the final building change detection masks. A change mask 
can be generated after giving a threshold value [16]. However, 
BBAs on the partial ignorance and full ignorance set should 
also be considered in the decision-making procedure. The 
building change probability map is only a part of the global 
BBAs. DST and DSmT propose different approaches to make 
the final decision. Several decision criteria are available. In this 
article, four decision criteria are tested. They are: 1) maximum 
of belief (Max_Bel), 2) maximum of plausibility (Max_Pl), 
3) maximum of betting probabilities (Max_BetP) and 4) the 
maximum of DSmP (Max_DSmP) [18], [19](Vol. 3, Chap. 3). 

1) Maximum of Belief (Max_Bel): Valid for different strate- 
gies of BBA modelings and fusions according to Fig. 4. 
More precisely, for a strategy G generating a combined 
mass € G ,G2,G3,G4, the label (decision) is obtained by 
comparing the final global mass values obtained from Table. 
Il and IV. 


Label = argmax{G(61), G(@2), G(0@3) }. (20) 


2) Maximum of plausibility (Max_Pl): Plausibility is de- 
fined in Eq. (3). Max_Pl compares the plausibility of each 
class. 


Label = argmax{ Pl(61), Pl(02), Pl(03)}. (21) 


3) Maximum of betting probabilities (Max_BetP): The pig- 
nistic probabilities, denoted as BetP, is making decisions on 
the pignistic level. In the betting probabilities, global masses of 
joint focal elements are averagely redistributed to each class. 


|An BI 


BetP(A) = S- S m(B),A€ 0. (22) 
geo CSP 
Label = argmax{ Bet P(0,), Bet P(02), BetP(03)}. (23) 


4) Maximum of DSmP (Max_DSmP): DSmP probabilistic 
transformation is an important alternative to the pignistic 
transformation [34]. The basic idea of DSmP is to redistribute 
the mass of (partial and total) ignorances proportionally to the 
masses of singletons involved in the ignorances. 


yt m(Z)+e|AN Bl 


a 

_ Z\|=1 
DSmP-(A) = S> m2) FBI m(B). (24) 

Beo ZCB 

|Z[=1 


where € > 0 is a small positive number (typically 0.001) that 
avoids numerical indeterminacies in very degenerated cases 
occurring if the mass in the denominator of Eq. (24) is zero. 
More detailed information about DSmP is given in [34]-[35]. 


Label = argmax{ DSmP(61), DSmP(62), DSmP(63)}. 
(25) 
Among the four decision-making rules, max of belief or 
max of plausibility have the advantage to be very simple 
to calculate but they represent respectively two extreme pes- 
simistic or optimistic decisional attitudes. The choice of one 
of these extreme attitudes depends on the consequence of 
decision error we are ready to take which is conditioned by 
the type of application under concern. Moreover, it has been 
shown by [34] that the more sophisticate transformation DSmP 
outperform BetP transformation at a price of much higher 
computational complexity, which can be a bottleneck in some 
real-time image processing applications. 


IV. EXPERIMENTS 
A. Datasets 


The belief function-based building change detection models 
have been tested on four pairs of satellite images. Each of 
the first three experimental datasets consist of two pairs of 
IKONOS stereo imagery captured in February 2006 and May 
2011 over an industrial region in Dong-an, North Korea. These 
three sub-test regions are shown in Fig. 5 and 6 and 7, 
respectively. The original IKONOS stereo imagery has | m 
pixel size in the panchromatic band and 4 m pixel size in the 
multispectral bands. The fourth experimental dataset (shown 
in Fig. 8) was captured over the centre of Munich, Germany, 
which is a typical European urban region. The two pairs of 
stereo data of this dataset were captured by IKONOS on July 
15, 2005 and WorldView-2 on July 12, 2010, respectively. In 
Fig. 5 to Fig. 8, the first two images are the panchromatic 
images of before- and after-change. (c) and (d) are the gener- 
ated DSMs. They have been generated based on the method 
explained by [27]. The elevation values from low to high are 
represented with the colours from dark blue to dark red as 
described in the colour bar. These images are co-registered 
through camera model parameter corrections before the DSM 


269 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


generation procedure with block adjustment among all datasets 
[27], [36]. A sub-pixel accuracy in planimetry and | to 2 m 
in height can be achieved. The Gram-Schmidt pan-sharpening 
method which has been widely used and implemented in ENVI 
software is applied to the multispectral channels of all three 
test regions [37]. In the first three subsets the generated DSMs 
have been re-sampled to | m resolution. As the IKONOS and 
WorldView-2 data for the Munich test region have different 
resolutions, the IKONOS images are up sampled to 0.5 m 
resolution, to be equal to WorldView-2 data. Instead of down- 
scale [38], an up-scale re-sampling is selected here to keep 
the sharp boundaries in the WorldView-2 data. The resulting 
DSMs also have a resolution of 0.5 m. 

Fig. 5 and Fig. 6 show normal building change examples 
with DSMs in high accuracy. The size of these two test regions 
are 450x700 m?, and 1000x400 m2, respectively. In Fig. 5 
some seasonal changes are visible. The generated DSMs are 
displayed in Fig. 5c and 5d. The second test region (Fig. 6) 
shows much larger sized buildings, and these buildings are 
well separated from each other. 

The third test region consists of two images with the size 
of 160x340 pixels. This region is characterised by small 
sized buildings (Fig. 7). It has to be mentioned, the largest 
building with a dark colour roof does not have the correct 
height in the first DSM, as is shown in Fig. 7c. This test 
region is especially selected to prove the robustness of our 
fusion models. The image size of the fourth test region 
is 1600x1600 pixels, which is 640,000 m?. It has mainly 
large size buildings with complex roof shapes. From 2005 
to 2010, besides newly constructed buildings, there are also 
rebuilt/demolished buildings. Especially, many roofs have been 
renovated with another material. Without height information, 
it is very difficult to separate the newly constructed buildings 
from other kinds of changes. 


B. Results 


The proposed DS fusion model and PCR6 fusion model 
have been applied to all datasets. In the first step, the four sets 
of global BBAs for all three focal elements and joint elements 
are generated based on various fusion rules and fusion rule 
combinations. In the second step, building change masks are 
generated by using four decision criteria. All three classes 
including BuildingChange, OtherChange and NoChange are 
generated. But this article focuses on the newly constructed 
buildings, thus only the BuildingChange results are analysed 
and evaluated. The proposed models have two novel proper- 
ties. The first one is the improved fusion model, and the second 
one is the reliability discounting. In the experimental part, the 
minimal value of the reliability map generated from DSM gaps 
is manually modified to 0.1 to remove too small values. In the 
height change reliability map generation procedure, a window 
size of 9 x 9 is selected. 

To prove the advantages of the proposed method, firstly the 
best building change detection results are displayed together 
with the original height change map. The results of all four 
test regions are displayed in Fig. 9, Fig. 10, Fig. 11 and Fig. 


12, respectively. In each figure, different colours represent 
different height changes in Figs. 9-12(a). Figs. 9-12(b) are 
the generated building change masks. To show the quality of 
these building change masks, these masks have been overlaid 
with the change reference data, which have been manually 
extracted for all four test regions. In Figs. 9-12(b) the green 
colour represents the correctly detected building changes. The 
false alarms which indicate pixels that are wrongly detected 
as building changes are presented with red colours. The blue 
colour objects are the misdetected changed buildings, which 
are named as false negatives in this article. 

Generally speaking, the proposed models are able to extract 
the newly constructed buildings in high accuracy. Noise effects 
from the height change map have been largely reduced in the 
final change results. The four selected test regions present 
four different situations. In the first test region most of the 
buildings are relatively low in the height and well separated 
from each other. The second test region has much higher 
and larger buildings, which produce large regions of shadow. 
The third test region is a special case. As we observe, in the 
first DSM of test region 3 the height of one big building is 
not correctly extracted. Actually the same building has been 
detected as false alarm and been discussed in the reference 
[16]. It has been explained in [16], due to the large region size 
and height change values, the false alarm can not be avoided. 
The fourth test region is much more complicated than the 
others, exhibiting very high building density, complex roof 
shapes and various building change types. 

Benefiting from the improved fusion models and the re- 
liability discounting procedure, some false alarms can be 
successfully avoided. Especially for the building in the left- 
bottom corner in the test region | and that big building in 
test region 3. In both situations, the first DSM is not able to 
get the correct height values. Based on the traditional feature 
fusion approach or our initial fusion model [16], this kind of 
buildings will very possibly be detected as BuildingChange. 
However, as we observe in the presented change detection 
results, these buildings are correctly detected as NoChange. It 
has to be noted that vegetation change is not considered in 
this model. Thus, in the centre of the first region, these two 
large regions of false alarms, which are newly planted trees 
from visual interpretation, are not able to be avoided. Another 
difficult to detect region is one building in construction. Half of 
the building has been finished in the after-change data; thus, 
this region has both height and spectral changes. As it can 
not be called a finished building yet, we did not include it as 
BuildingChange in our reference data. 

Many false negatives (blue regions/pixels) in Fig. 10(b) are 
visible. Most of these false negatives can be explained by the 
quality of the DSMs. A subset of the gaps mask of test region 
2 in date2 is displayed in Fig. 13. As it shows all of the four 
missed buildings (shown in blue colour) are actually gaps in 
the unfilled DSM. After gaps filling, they are not interpreted 
with correct height values, as shown in Fig. 6 (d). Thus, 
these four buildings only feature spectral changes, therefore 
are falsely identified as OtherChange. 
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Fig. 5. Datasets of the test region 1: a) panchromatic image from datel; b) panchromatic image from date2; c) DSM from datel; (d) DSM from date2. 


(d) 


90 95 Elevation (m) 


Fig. 6. Datasets of the test region 2: a) panchromatic image from datel; b) panchromatic image from date2; c) DSM from datel; (d) DSM from date2. 
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Fig. 7. Datasets of the test region 3: a) panchromatic image from datel; b) panchromatic image from date2; c) DSM from datel; (d) DSM from date2. 
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Fig. 8. Datasets of the test region 4: a) panchromatic image from datel; b) panchromatic image from date2; c) DSM from datel; (d) DSM from date2. 
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Fig. 9. Change detection results of test region 1 (a) original height change map (b) building change result Max_PI(G4) overlaid with change reference 


— True Detected 


data. 
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Fig. 10. Change detection results of test region 2 (a) original height change map (b) building change result Max_Pl(G4) overlaid with change reference 


data. 
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Fig. 11. Change detection results of test region 3 (a) original height change map (b) building change result Max_DSmT (G2) overlaid with change reference 


data. 
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Fig. 12. Change detection results of test region 4 (a) original height change map (b) building change result Max_DSmT (G2) overlaid with change reference 
data. 


tar 


Fig. 13. DSMs gaps of part of test region 2 (black holes). 
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C. Results evaluation 


To further understand the quality of these results and the 
advantages of the method, more evaluation and analysis are 
proposed. First, the building change masks extracted from 
these four global BBA sets are compared and evaluated. Each 
global BBA set results four building change masks based 
on the four decision criteria. The building change masks are 
compared with the masks from [30]. The accuracy of these 
results have been evaluated by comparing them with ground 
truth images, which have been manually prepared by visually 
comparing the pre- and post-event images and referring addi- 
tional Google Earth history data [39]. The similarity between 
the obtained result and the ground truth is measured in terms 
of Kappa Accuracy (KA) [40]. The evaluation results of test 
region 1, 2, 3 and 4 are shown in Table. V, VI, VII and VIUI, 
respectively. Limited to the available reference data, only the 
BuildingChange class is evaluated. 


1) Comparison of the fusion and decision rules: Table V 
to VIII mainly aim to describe and compare the performance 
of the DS fusion and DSmT fusion rules and the four decision 
criteria. Unfortunately, the differences among these four global 
BBA sets of all four test regions are indistinguishable. Our 
quantitative evaluations results allow comparing the different 
fusion and decision-making strategies for building change 
mask construction in different types of region under analysis. 
As we have observed, there is no unique best fusion and 
decision strategy working for all types of regions which is 
an interesting result to be aware of and the different fusion 
methods (with a chosen decision strategy) perform always 
better with our refined approach than the previous (original) 
works which is the main contribution of this work for all type 
of regions tested. 


2) Validation of the reliability discounting: The global 
BBAs obtained with and without reliability discounting are 
listed under the name of Refined and Original in Table V to 
VUI. Original refers to the approach presented by [30], in 
which the reliability discounting is not involved. 


In the first test region, the advantage of the reliability dis- 
counting is not obvious. By using the Max_Pl and Max_BetP 
decision rules, the refined models perform better than the 
original models. However, the original models get higher KA 
values when using the Max_Pl and Max_BetP decision rules. 
This can be partly explained by the shadow detection results, 
as one dark colour building roof (middle left in the test region) 
get higher probability to be shadow; thus, a lower probability 
to be BuildingChange. 


The second test region is characterised mainly by large and 
high buildings; thus, the influences of shadows are stronger 
than in the first test region. The refined models with reliability 
discounting get generally better accuracy than the original 
fusion models. Here, we will compare the Max_DSmP of G4 
of this test region, as it shows the highest difference among 
these four decision criteria in Table VI. Fig. 14 shows building 
change masks of the top left part of the test region 2. Fig. 


14 (a) and (b) display the change masks obtained from the 
original model and the refined model overlaid with the change 
reference mask respectively. The same as Fig. 10 (b), the 
green colour represents the true detected, the red colour shows 
the false alarms, while the blue colour pixels are the false 
negatives. As it shows, based on the refined model, building 
boundary regions of the change mask obtain less false alarms 
than the results from the original fusion model. 


The advantage of the improved decision fusion models has 
been well proved by Table VII. The first DSM of this test 
region contains a large region of pixels with incorrect height 
introduced by stereo image matching failures. The improved 
models can solve this problem by adopting the reliability map 
of height change. Therefore, the increase of KA value of this 
region is much higher than for the other two test regions. 
More precisely, under all fusion rules the KAs have improved 
from around 0.30 to 0.50. For better understanding of this 
improvement, the global BBAs of BuildingChange without and 
with reliability discounting are displayed in Fig. 15 (a) and (b), 
respectively. We display here only the Prob(@,) of G,. Both 
probability maps are less noisy than the original height change 
map, which are displayed in Fig. 11. By observing the original 
panchromatic images in Fig. 7, it is not difficult to find out that 
this building exists in the panchromatic images of both dates. 
This is the same building that has been mentioned in [16], 
for which only the DSM of pre-change contains the correct 
height values. In Fig. 7 (c), this building can not be recognised 
as a high-level object. A higher value in ™mj(.) leads to a 
larger global BBA in the class of BuildingChange. Thus, this 
building would be incorrectly detected as BuildingChange if 
no reliability discounting is applied (Fig. 15(a)). Fig. 16 shows 
the generated height change reliability map. As can be seen, 
that building region get very low reliability values, that means 
the height changes of this region cannot be trusted. Therefore, 
the proposed model is able to remove this kind of errors and 
correctly recognise this region as NoChange (Fig. 11). 


The Munich test region has a much larger size and includes 
several kinds of building changes. The proposed method is 
able to fuse the spectral and height information efficiently; 
thus, to identify the newly constructed buildings. The main 
false negatives are produced in the rebuilt buildings and 
construction sites. As shown in Fig. 17, the labelled four 
buildings represent four types of changes. Building A is 
labelled as a newly constructed building in our reference data. 
However, half of that building has similar shape and height 
as the original one, which brings false negatives to our result. 
Building B, C and D are buildings in different construction 
phases. By referring [39], in the reference data only D is 
identified as OtherChange as it is almost completed in Fig. 
17(a). In the result we are able to correctly identify B as 
a newly constructed building and D as OtherChange. But 
building C is falsely labeled as OtherChange due to low height 
change values. 
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TABLE V 
CHANGE MASKS EVALUATION FROM FOUR GLOBAL BBAS OF TEST REGION I| (KA). 


Gy Go G3 G4 
Original Refined Original Refined Original Refined Original Refined 
Max_Bel 0.7392 0.7150 0.7369 0.7138 =—-0.7419 (0.7144 0.7391 (0.7130 
Max_Pl 0.7619 0.7648 0.7607 0.7642 0.7623 0.7652 0.7609 0.7641 
Max_BetP 0.7533 0.7442 0.7515 0.7423 0.7541 =0.7428 =0.7522 0.7412 
Max_DSmP 0.7468 0.7200 0.7450 0.7189 0.7490 0.7190 0.7465 0.7181 


TABLE VI 
CHANGE MASKS EVALUATION FROM FOUR GLOBAL BBAS OF TEST REGION 2 (KA). 


Gy G2 G3 G4 
Original Refined Original Refined Original Refined Original Refined 
Max_Bel 0.7401 0.7821 0.7399 0.7816 0.7401 0.7826 0.7401 0.7821 
Max_Pl 0.7380 0.7800 0.7391 0.7818 0.7380 0.7812 0.7393 0.7831 
Max_BetP 0.7413 0.7853 0.7409 0.7853 0.7412 0.7868 0.7409 0.7867 
Max_DSmP 0.7402 0.7842 0.7403 0.7841 0.7405 0.7857 0.7403 (0.7855 


TABLE VII 
CHANGE MASKS EVALUATION FROM FOUR GLOBAL BBAS OF TEST REGION 3 (KA). 


Gi G2 G3 Ga 
Original Refined Original Refined Original Refined Original Refined 
Max_Bel 0.3356 = 0.5432) 0.3356) (0.5418 90.3351 «0.5415 = 0.3345 = 0.5419 
Max_Pl 0.2396 0.3689 0.2416 0.3703 0.2391 0.3694 0.2409 0.3713 
Max_BetP 0.2860 0.4726 0.2885 0.4756 0.2869 0.4761 0.2882 0.4786 
Max_DSmP_ 0.3043 0.5082 0.3057. 0.5094 0.3008 =0.5072 +=—-0.3030 =: 0.5066 


TABLE VIII 
CHANGE MASKS EVALUATION FROM FOUR GLOBAL BBAS OF TEST REGION 4 (KA). 


Gy G2 G3 G4 
Original Refined Original Refined Original Refined Original Refined 
Max_Bel 0.5158 0.5217 0.5159 0.5219 0.5154 0.5193 0.5158 0.5195 
Max_Pl 0.5122 0.5229 0.5125 0.5232 0.5120 0.5224 0.5128 0.5232 
Max_BetP 0.5137 0.5268 0.5140 0.5267 0.5135 0.5258 0.5137 0.5258 
Max_DSmP 0.5161 0.5285 0.5163 0.5284 0.5157 0.5274 0.5162 0.5275 


True Detected 


False alarm 


False negative 


(a) (b) 


Fig. 14. Building change masks from the original model (a) and refined model (b) of a subset of test region 2. 
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Probability (100%) 


Fig. 15. Comparison of building change global BBAs Probg, of G1 based on the fusion models without reliability discounting (a) and with reliability 
discounting (b). 


Probability (100%) 


Fig. 16. Reliability discounting map of the height changes of test region 3. 


(b) 


Fig. 17. Example of the various building change types in test region 4. 
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A windowsize of 9x9 has been used to generate the aay. In 
order to test the sensitivity of our fusion model to the window 
width used, we have changed the width parameter from 3 to 
13 by steps of 2. For each size, we generate the global BBA 
G,. Thus four final building change masks based on the four 
decision criteria can be regenerated. We provide the KA for 
each mask as show in Fig. 18. As a comparison, we have also 
provided the KAs without using aay. This test shows that 
the final results benefit largely from the reliability discounting 
procedure, but the KA rate did not change significantly with 
various window sizes for all four test regions. 


D. Comparison with existing methods 


In this section, the improved belief fusion models are 
compared with the directly feature fusion method [41] and 
the initial fusion model that described in [16]. 

As a typical feature fusion approach, [41] adopted the kernel 
Minimum Noise Fraction (KMNF) approach to fuse change 
features from the DSMs and panchromatic images. Based on 
the resulting KMNF components, a change mask was extracted 
with iterated canonical discriminant analysis (ICDA). [30] 
randomly selected the training data from the ground truth, 
as the experiments were devoted to algorithm comparison. 
However, it was not a practical procedure, because in real 
situations the ground truth is unknown. Therefore, in this 
article as well as using the set of random pixels from the 
ground truth, another set of training data for each test region 
is prepared by manually selecting changed regions. All pixels 
in these regions are then used as training samples. 

The results generated based on these two sets of training 
data are described as KMNFyandom and KMNFinanuals 
respectively, in Table IX in the term of KA and Overall 
Accuracy (OA). All training data in the first three test regions 
contain around 200 pixels/samples. In the fourth test region, 
500 pixels are used to fit with the large image size. If the 
training data are selected from the ground truth, the newly 
proposed approach can deliver a slightly better result than the 
approach in [41]. When using the manually selected training 
data, the advantages of the newly developed approach are 
obvious. As in real applications the ground truth is normally 
unknown, we conclude that the proposed fusion method is 
more robust for larger test regions with diverse characterised 
objects. 

In addition, the approach proposed by [16] is tested on the 
same test data, and the results are shown in the third and fourth 
columns. In that approach, after the fusion approach a shape- 
based refinement was proposed to reach the final building 
change mask. Thus, the resulting masks before and after the 
refinement procedure are both calculated and evaluated. In the 
North Korea test region, we have used Theignt = 8M, Tarea = 
50m? and Teonvexity = 9.55 as thresholds. And in the Munich 
test region, as the buildings have a larger size and complicated 
roof shapes than North Korea, we manually modified these 
threshold values to Tarea = 100m? and Tronverity = 0.50 
to improve the results. The accuracies are recorded in the 
columns [16]bcfore and [16]after in Table IX. The refinement 


is not included in this article to avoid unnecessary threshold 
parameters; thus, to achieve an automatic and robust work- 
flow. By comparing the KAs with Tables V, VI and VII, one 
can see that the shape-based refinement can further improve 
the result accuracy. But the fusion model in [16] performs 
rather weakly. All obtained KAs are lower than values from 
the proposed refined decision fusion approaches, especially for 
test regions 2 and 3. 


TABLE Ix 
COMPARISON WITH EXISTING METHODS. 


KM NF;andom kKMNF manual [16]bc fore [16]after 


KA OA KA OA KA OA KA OA 


0.7178 
0.6791 
0.2195 
0.2057 


Region! 
Region2 
Region3 
Region4 


0.9799 
0.9822 
0.9794 
0.9878 


0.5477 
0.2458 
0.2272 
0.1937 


0.9803 
0.9688 
0.9799 
0.9876 


0.5929 0.9628 
0.6433 0.9681 
0.3060 0.9375 
0.4909 0.9912 


0.6312 0.9683 
0.6718 0.9718 
0.3287 0.9447 
0.5641 0.9941 


It has to be mentioned that vegetation change is not noted 
as false alarm in the improved decision fusion model. As the 
vegetation change and building change can be easily separated 
by using a vegetation index. [16] has adopted vegetation index 
as no-building change indicators to highlight building changes. 
This step is not considered in this article as not many forest 
changes are available in the test regions. Moreover, if forest 
changes are of interest, we can easily modify this model 
using vegetation index to separate forest changes from building 
changes. 


V. CONCLUSIONS AND PERSPECTIVES 


Building change detection is a difficult topic, to solve 
uncertain change information from images and DSMs, deci- 
sion fusion methods have been introduced as a new concept 
and proved to be efficient and appropriate. The innovative 
contribution of this article is the improvement of the decision 
fusion models. DS as well as DSmT decision fusion models 
are further developed to solve the building change detection 
problem in this article. Another contribution lies in the BBA 
calculation procedure, and the sigmoid distribution is further 
improved by taking both concordance and discordance situa- 
tions. As a third contribution, the reliability of each indicator 
is introduced according to the change objects of interest. 

The proposed building change detection models enable an 
improved result by comparing to the original fusion model 
and other change detection methods. A comparative analysis 
of the results shows that there is not a so big difference of 
performances between DS and DSmT fusion methods based 
on the best decisional strategy and so we can in practise use 
the simplest fusion method to reduce to computational burden 
without degrading too much the performance. Of course the 
most critical question is to select beforehand the decisional 
strategy based on type of region under analysis, for this we 
need to define efficient indicators for characterising each type 
of region which then will help us to automatically select the 
best criterion to use. Our future research works will address, 
and hopefully help, to solve this important question. 
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Fig. 18. Effect of windowsize on KA for test region 1 (a), test region 2 (b), test region 3 (c) and test region 4 (d). 
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Abstract—This paper investigates the use of the URREF 
ontology to characterize and track uncertainties arising within 
the modeling and formalization phases. Estimation of trust 
in reported information, a real-world problem of interest to 
practitioners in the field of security, was adopted for illustration 
purposes. A functional model of trust was developed to describe 
the analysis of reported information, and it was implemented with 
belief functions. When assessing trust in reported information, 
the uncertainty arises not only from the quality of sources or 
information content, but also due to the inability of models 
to capture the complex chain of interactions leading to the 
final outcome and to constraints imposed by the representa- 
tion formalism. A primary goal of this work is to separate 
known approximations, imperfections and inaccuracies from 
potential errors, while explicitly tracking the uncertainty from 
the modeling to the formalization phases. A secondary goal is to 
illustrate how criteria of the URREF ontology can offer a basis 
for analyzing performances of fusion systems at early stages, 
ahead of implementation. Ideally, since uncertainty analysis runs 
dynamically, it can use the existence or absence of observed states 
and processes inducing uncertainty to adjust the tradeoff between 
precision and performance of systems on-the-fly. 


Keywords: uncertainity, reported information, trust, belief 
functions, information fusion, DSmT, URREF ontology. 


I. INTRODUCTION 


A key element when designing information fusion systems 
is the way the system designer isolates and analyzes real world 
phenomena. A model is abstracted into a simpler representa- 
tion, in which components, modules, interactions, relationships 
and data flows are easier to express. Uncertainty tracking 
highlights approximations induced by model construction and 
its formalization, as well as providing a checklist to ensure 
that all uncertainty factors have been identified and considered 
ahead of system implementation. 

This paper illustrates the use of the uncertainty represen- 
tation and reasoning framework (URREF) ontology [1] to 
identify and assess uncertainties arising during the modeling 
and formalization phases of an information fusion system 
intended to estimate trust in reported information. 

Trust assessment is a real-world problem grounded in many 
applications relying on reported items, with different persons 
observing and then reporting on objects, individuals, actions 
or events. For such contexts, using inaccurate, incomplete or 
distorted items can result in unfortunate consequences and 
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analysts need to ensure the consistency of reported information 
by collecting multiple items from several sources. 

From the perspective of an information analyst, trust can 
be analyzed along two dimensions: the subjective evaluation 
of items reported by the source itself, called self-confidence, 
and the evaluation of source by the analyst, called reliability. 
While self-confidence encompasses features of subjectivity, 
the reliability of a source is related to the quality of previously 
reported items, the competence of the source for specific 
topics, and the source’s capacity for misleading intentions. 
Trust estimation aims at capturing, in an aggregated value, 
the combined effects of self-confidence and reliability on the 
perceived quality of information. The model is represented 
with belief functions, a formalism which offers a sound math- 
ematical basis to implement fusion operators which estimate 
trust by combining self-confidence and reliability. 

The model developed for trust assessment focuses on the 
global characterization of information and provides a better 
understanding of how trust is to be estimated from various 
dimensions. The overall process has humans as a central 
element in both the production and the analysis of information. 

Trust in reported information offers a good illustration for 
tracking uncertainty: the phenomenon is complex, so any 
model adopted is generally a simplification of the real world 
interactions. Uncertainties can be made explicit not only for 
static elements of the model, such as sources or items, but 
also for the dynamic processes of combining items with one 
another. Moreover, adopting belief functions as representation 
formalism will have an impact on the way an information 
system could be implemented and on the accuracy of its 
results. 

The contribution of this paper is twofold: first, it presents 
a trust estimation model which combines the reliability of 
sources and self-confidence of reported items, and, second, 
the paper analyzes types of uncertainty occurring during 
modeling and formalization by relating elements of the model 
to uncertainty criteria defined by the URREF ontology. 

The remainder of this paper is divided into 8 sections: 
section II discusses related approaches for trust modeling and 
uncertainty assessment. The problem tackled in this paper 
in presented in section II. Section IV describes the model 
developed for trust estimation, while its implementation with 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


belief functions is presented in section V. The analysis of 
uncertainty is discussed in VI, while examples and scenarios 
for trust assessment are presented in section VII. Strengths 
and limitations of belief-based formalization are discussed in 
section VIII and section IX concludes this paper. 


II. RELATED APPROACHES 


The work presented in this paper is related to approaches for 
trust modeling and assessment as well as solutions for uncer- 
tainty analysis for information fusion systems. Trust modeling 
is not a new research topic; it spans diverse areas such as 
agent systems [2] and logical modeling and argumentation [3]. 
The Internet and social media offer new application contexts 
for trust assessment; this topic is addressed in relation to 
service provision on the Internet [4], social networks analysis 
[5], and crowdsourcing applications [6]. Trust analysis is also 
of interest in the military field where techniques have been 
developed in order to identify clues of veracity in interview 
statements [7]. 

The concept of trust in these communities varies in how 
it is represented, computed and used. Although having an 
obvious social dimension, trust is not only understood with 
regard to other humans, but also towards information pieces 
[6], information sources [8], Internet sites [9], algorithms for 
data and knowledge fusion [10], intelligent agents [2], and 
services for the Internet of things [11]. 

While definitions of trust vary from one domain to another, 
there are some common elements. The first commonality for 
all research areas cited above is to consider trust as a user- 
centric notion that needs to be addressed in integrated human- 
machine environments which rely heavily on information col- 
lected by humans, even if further processing can be executed 
automatically. Moreover, all definitions associate some degree 
of uncertainty with trust, which is then captured by concepts 
such as subjective certainty [12] and subjective probability 
[13]. 

Trust goes hand in hand with the concepts veracity [14] 
and deception. [15] addresses veracity along the dimensions 
of truthfulness / deception, objectivity / subjectivity and 
credibility / implausibility. The authors developed a verac- 
ity index ranging from true/objective/credible to untrustwor- 
thy/subjective/implausible to characterize texts in the context 
of big data analysis. Deception is defined as a message 
knowingly transmitted with the intent to foster false beliefs or 
conclusions. The topic is addressed in studies from areas such 
as interpersonal psychology and communication [16], [17] and 
it is also considered in the field of natural language processing, 
as part of a larger research direction tackling subjectivity 
analysis and the identification of private states (emotions, 
speculations, sentiments, beliefs). These solutions stem from 
the idea that humans express various degrees of subjectivity 
[18] that are marked linguistically and can be identified with 
automatic procedures [19]. 

Contributions on trust estimation keep the distinction be- 
tween analyzing the source of information, the item reported 
and reasoning about trust. Approaches developed for trust 


in information sources consider that trust is not a general 
attribute of the source but rather related to certain properties: 
competence [20], sincerity and willingness to cooperate [3]. 
On this basis, it becomes possible to consider the competence 
of a source not in general but with respect to specific topics 
[21]. Trust can be also analyzed in relation to roles, categories 
or classes [22]. 


Research efforts on reasoning about trust analyze informa- 
tion sources from past behaviors rather than directly from their 
properties [23], or they infer trust from estimations already 
computed for a set of properties [24]. These approaches 
generally focus on building trust by using argumentation [25] 
or beliefs functions [26], or investigating the joint integration 
of those techniques [27]. Taking this work a step further, [28] 
identified several patterns for reasoning about trust and its 
provenance while the notion of conflict in handling trust is 
discussed in [29]. 


As shown by approaches above, trust is a multifaceted con- 
cept and, in practice, this complex notion can be decomposed 
into two components: communication or interaction trust, and 
data trust [30]. The model developed deals with data trust 
and keeps the distinction between sources and items provided 
by those sources, although several approaches consider these 
elements as a whole [26], estimating the trust of information 
sources [24], [29] rather than information items. The model 
does not require statistical data to infer the behavior of 
the source [23] and introduces reliability to characterize the 
source. More specifically, reliability encompasses not only 
competence [22], [20] and reputation [21] - two attributes 
already considered by previous approaches - but also intentions 
which constitute an original aspect of the model. Intention is 
of important significance in the context of human-centered 
systems, including open-sources, and supports the analysis of 
emerging phenomena such as on-line propaganda or disinfor- 
mation. Another original aspect of the model is consideration 
of the characterization of items by the source itself, thus 
overcoming a main limitation of the solution presented in 
[31]. Our approach can be considered as partially overlapping 
solutions investigating trust propagation in direct and indirect 
reporting [28], [25], and the model enables a particular kind 
of trust estimation, based both on more or less complete 
characterizations of the source by the analyst, and more or 
less accurate characterizations of the items by the source. The 
model also addresses disagreement and the fusion of diverging 
opinions, not in a panel of experts as described in [27], 
but rather between items showing high levels of confidence 
according to the source and sources having low reliability 
according to the analyst. By ascribing characterizations to 
both information sources and reported items, the model allows 
analysts to make use of both prior experience and their own 
beliefs in order to assess various degrees of trust. 


From a different perspective, the evaluation of uncertainty 
regarding the inputs, reasoning and outputs of the informa- 
tion fusion is the goal of Evaluation Techniques for Un- 
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certainty Representation Working Group! (ETURWG). The 
group developed an ontology for this purpose [1]. The URREF 
ontology defines the main subjects under evaluation [32], 
such as uncertainty representation and reasoning components 
of fusion systems. Furthermore, the frame also introduces 
criteria for secondary evaluation subjects: sources and pieces 
of information, fusion methods and mathematical formalisms. 
URREF criteria have generic definitions and therefore can be 
instantiated for applications with coarser or finer granularity 
levels. This means evaluation metrics can be defined for 
data analysis [33], increased particularity for data specific 
types [34] or attributes, reliability and credibility [35], self- 
confidence [36] or veracity [37]. 

In addition to allowing a continuous analysis of uncertainty 
representation, quantification and evaluation, as described in 
[38], URREF criteria are detailed enough to capture model- 
embedded uncertainties [39], imperfection of knowledge rep- 
resentations [40], and their propagation in the context of the 
decision loop [41]. The frame also offers a basis to compare 
different fusion approaches [42]. URREF criteria were used for 
uncertainty tracking and investigation in several applications: 
vessel identification for maritime surveillance [43], activity 
detection for rhino poaching [44] and imagery analysis for 
large area protection [45]. 

Beyond developing a model for trust estimation, this paper 
also fills a gap within the ETURWG community by illustrating 
how uncertainty analysis tracks imperfections occurring from 
problem definition to model abstraction and formalization. 


III. HUMAN SOURCES AND REPORTED INFORMATION 


Many applications rely on human sources which are used 
to continuously supply observations, hypotheses, subjective 
beliefs and opinions about what they sense or learn. In such 
applications reports are often wrong, due to environment 
dynamics, simple error, malicious act or intentions, [46]. From 
the analyst standpoint, decisions have to be made based on 
indirect reporting and trust relies upon the in-depth inves- 
tigation of items and sources, thus the analysis of reported 
items is a critical step. This analysis is a multilevel process, 
relying on the ability of analysts to understand the content 
of messages and assess their quality from additional clues. 
The use cases described below highlight levels of indirection 
occurring when collecting information and their with impact 
on trust estimation. 


A. Assertions, opinions and reported information 


For illustration, let’s consider X , the analyst receiving 
information provided by a human source Y. 


Case 1: direct reporting X is an analyst collecting ev- 
idence in order to decide whether or not an individual is 
involved in terrorist activities. In particular, he takes into 
account reports submitted by Y, a human source. Those 
reports usually consist on a mixed set of assertions (e.g., 
descriptions of events or states observed by Y) and opinions 
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(i.e., judgments, assessments, or beliefs) expressed by Y about 
assertion which give the analyst an insight into how strongly 
the source commits to the assertion, see fig. 1. 


Itis very likely that John is a terrorist. 
Opinion Primary information 
Fig. 1. Assertions and opinions in human messages. 


In the statement contained in fig. 1, the source Y lets us 
know that she does not commit her full belief to the assertion 
that John is a terrorist, otherwise the reporter would have used 
phrasing such as IJ am completely convinced or it is without 
doubt or simply reported John is a terrorist as an unadorned 
statement. 

The information item is the sentence, which contains the 
assertion John is a terrorist and the uncertainty degree to be 
assigned because the analyst knows that Y is not completely 
certain about her own statements. The analyst must make a 
judgment about the veracity of John being a terrorist based 
upon factors such as previous experience with Y ’s assessments 
in the past, or, perhaps, on the fact that other sources are 
relating the same information. 


Case 2: indirect reporting Again, let X be an analyst 
collecting evidence in order to decide whether or not an 
individual is involved in terrorist activities. In this case, he 
takes into account reports submitted by Y, a human source 
who is herself relating information obtained from a secondary 
source named Mary, see fig. 2. 


Mary toldme itis very likely that John is a terrorist. 


Secondary Primary information 
source 


Reporting Opinion 


Fig. 2. Hearsay, assertions and opinions in human messages. 


The source Y does not report on her direct observations 
or her deductions or beliefs, but conveys information received 
from a second source, in this case Mary, in the statement in 
fig. 2. 

In this report the information item is again the sentence 
containing the assertive part John is a terrorist but this use 
case introduces more levels of complexity in uncertainty to 
deal with. The information that the assertion comes from Mary, 
who has added her own opinion, is a distancing mechanism 
on the part of the source Y as (unlike in fig. 1), she is neither 
claiming the opinion nor the assertion. 

This case introduces yet more layers of uncertainty. How 
sure can we be that the reporter Y has accurately repeated 
what Mary said? For example, did Mary really say it is likely 
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or did the reporter insert this (intentionally or unintentionally) 
based upon the reporter’s assessment of the reliability of 
Mary as a source of information? Or perhaps, subtly, Y is 
expressing her own uncertainty by putting words in Mary’s 
mouth. Furthermore, it is possible Mary made this statement 
under circumstances which would strengthen or weaken this 
statement, but those conditions have not been passed on by 
the reporter. 

The goal of the analyst is to take this assertion into account, 
but also to encode his own belief about the quality of the 
source further in the analysis. All these different attitudes 
have to be evaluated by the analyst, who may have additional 
background information or prior evaluation of the source that 
have to be considered. 

In both cases discussed above, the outcome of the analyst 
is the assertive part of the information item, augmented with a 
coefficient that helps to measure and track the different levels 
of trust for their future exploitation. For the purpose of this 
work, this quality is called trust in reported information. 


B. Concepts and notions for trust assessment 


This section introduces several notions that are relevant for 
trust analysis. 

Trustworthiness of information sources is considered, for the 
purpose of this work, as confidence in the ability and intention 
of an information source to deliver correct information, see 
[47]. Trustworthiness is an attribute of information sources 
who have the competences to report information, and who can 
be relied upon to share sincerely and clearly their beliefs on 
the uncertainty level of reported information. An item provided 
by such a source is then trusted by analysts. 

Self-confidence [36] captures the explicit uncertainty as- 
signed to reported assertions by the source. Statements may 
include the source’s judgments when lacking complete cer- 
tainty; these judgments are generally identified through the use 
of various lexical clues such as possibly, probably, might be, it 
is unlikely, undoubtedly, etc., all of which signal the source’s 
confidence (or lack thereof) in the veracity of the information 
being conveyed. It should be noted that self-confidence, in our 
usage understood as the linguistic dimension of the certainty 
degree that the source assigns to reported items, is an aspect 
exhibited by the source, but it will be considered from the 
analyst’s standpoint during trust analysis. 

Reliability of sources indicates how strongly the analyst is 
willing to accept items from a given source at their face-value. 
As an overall characterization, reliability is used in this work 
to rate how much a source can be trusted with respect to their 
reputation, competence and supposed intentions. 

Reputation of sources [48] captures a commonly accepted 
opinion about how the source performs when reporting infor- 
mation, and is generally understood as the degree to which 
prior historical reports have been consistent with fact. For 
human sources, reputation is considered by the analyst for 
each source based on previous interactions with the source 
and on the source’s history of success and failure in delivering 
accurate information. Reputation relies, to a large extent, upon 


negative and positive experiences provided to the analyst by 
the source in the past. 

Competence of sources [20] is related to a source’s pos- 
session of the skills and knowledge in reporting on various 
topics: This aspect defines to what extent a human source can 
understand the events they report on, whether the source has 
the ability to accurately describe those events, and how capable 
the source is of following the logic of processes producing the 
information. 

Intentions correspond to specific attitudes toward the effect 
of one’s actions or conduct. Reporting information can become 
more a means to manipulate others than a means to inform 
them [49] and thus can be carried out with the express 
purpose of inducing changes in another person’s beliefs and 
understanding. Intentions are specific to human sources as 
only humans have the capacity to deliberately provide false 
or misleading information. Sensors may provide erroneous 
data due to a number of factors such as device failure or 
environmental conditions, but never due to intention. 

In addition to the above facets, credibility of information 
and reliability of sources are two notions introduced by the 
STANAG 2511 [50], which standardizes the terminology used 
in analysis of intelligence reports used by NATO Forces 
with distinct focus on sources and information provided. 
STANAG reliability is understood with respect to the quality 
of information that has been delivered by sources in the past. 
STANAG credibility relies on the intuition that a joint analysis 
of items in combination with each other will likely reveal 
inconsistencies, contradictions or redundancies. Reliability and 
credibility are independent criteria for evaluation. Definitions 
for both reliability and credibility are in natural language. 

Attributes of sources and information items adopted for the 
model of trust are related to the notions introduced by the 
STANAG 2511 but are addressed differently: reliability of 
sources is understood here in terms of source competence, 
reputation and intentions, while credibility is restricted to 
features of self-confidence as described above. 


IV. A FUNCTIONAL MODEL OF TRUST 


This section introduces the model developed to estimate 
trust in reported information by taking into account the re- 
liability of the source and the source’s own characterization 
of reported items. The advantage of this distinction is to better 
dissociate the impact of both beliefs of sources and opinions 
of analysts on the source on the information provided. 

Even if the primary function of a source is to provide 
information, we keep the distinction between the source and 
the information by considering separate dimensions for each 
element. The rationale behind this is the observation that even 
reliable sources can sometimes provide inaccurate or imprecise 
information from one report to another, which is even more 
plausible in the case of human sources. 

The model, illustrated in fig. 3., is composed of a source 
which provides an information item augmented with a degree 
of uncertainty captured by self-confidence to an analyst. Based 
upon his direct assessment of the reliability of the source, 
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the analyst constructs his own estimation of trust in the item 
reported. 


Reported 
Information 


Self 


Confidence qieGe 


Source <q————_______. Analyst 
Reliability 


Fig. 3. Model for trust analysis. 


In the following section, the model is discussed using a 
granularity that is detailed enough to describe its elements, but 
still rough enough to avoid the adoption of a representation 
formalism. 


A. Elements of the trust model 


The model is composed of two elements: an information 
source and reported items from that source. The analyst is 
considered to be outside the model, although she has multiple 
interactions with its elements. 


Definition of information source: An information source 
is an agent who provides an information item along with 
a characterization of its level of uncertainty. ’Source” is a 
relative notion, depending on the perspective of analysis. In 
general, information is propagated within a chain relating 
real world information to some decision maker, and agents 
along the path can be both trained observers, whose job is 
to provide such reports, as well as witnesses or lay observers 
who may add items, in spite of not being primarily considered 
as information sources, but rather as opportunistic ones. 

The notion of source is central in many information fusion 
applications and numerous research efforts aimed at modeling 
the properties of those applications. A general analysis of 
sources is undertaken by [51], who identify three main classes: 
S-Space, composed of physical sensors, H-Space for human 
observers and I-Space for open and archived data on the 
Internet. In [52], a unified characterization of hard and soft 
sources is described, along with a detailed description of their 
qualities and processing capabilities. 

Processing hard sensor information is widely covered [53] in 
the research community, and can be considered quite mature, 
while the integration of human sources brings many new 
challenges. Our model addresses human sources, and reported 
items can refer to actions, events, persons or locations of 
interest. 

Information reported by humans is unstructured, vague, 
ambiguous and subjective, and thus is often contrasted with 


information coming from physical sensors, described as struc- 
tured, quantitative and objective. While humans can deliber- 
ately change the information or even lie, sensors are also prone 
to errors and therefore hard information items are not always 
accurate. 

For human agents, the source is part of the real world, 
(a community, a scene, an event) and can be either directly 
involved in the events reported, or just serving as a witness. 


Definition of reported information: Reported information 
is a couple (I, y(Z)), where J is an item of information and 
x(Z) the confidence level as assigned by the source. Items are 
information pieces that can be extracted from natural language 
sentences, although the extraction and separation from subjec- 
tive content are out of the scope for the model developed. Each 
item J has assertive 7, and subjective 7, components conveying 
factual and subjective contents respectively. 

The analysis of reported information continues to be an 
open topic as the fusion of information from soft sources 
receives increasing attention in recent years. Although some 
authors have developed logic-based approaches for modelling 
distortions of items exchanged between agents who have both 
the intention and the ability to deceive [31], there are still 
more challenges arising when the information is analyzed in 
its textual form. 

Features of uncertainty, as expressed in natural language 
statements, are analyzed in [54] while [55] provides a broader 
discussion of pitfalls and challenges related to soft data 
integration for information fusion. 


B. Functions of the trust model 


The model introduces several functions estimating features 
of reliability, self-confidence and trust, as described hereafter. 


Definition of a reliability function: A reliability function 
is a mapping which assigns a real value to an information 
source. 

This real value is a quantitative characterization of the 
source, inferred with respect to the source’s previous failures, 
its reputation and the relevance of its skills for specific 
domains. For this model, the reliability of human sources 
combines three features: competence, reputation and intention. 
Competence captures the intuition that the quality of informa- 
tion reported by a source depends on the level of training 
and expertise, which may be designated as satisfactory or not, 
depending upon the task. Reputation is the overall quality of a 
source, estimated by examination of the history of its previous 
failures. Intentions refer to attitudes or purposes, often defined 
with respect to a hidden purpose or plan to achieve. 

Reliability is a complex concept and, from a practical 
standpoint, it is difficult to have complete information about 
the global reliability of a source. Thus, this model describes 
reliability along the three attributes (competence of a source, 
its reputation and its intentions) described above. In practical 
applications, this solution allows for compensation for insuf- 
ficient information on one or several aspects of reliability and 
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to conduct, if necessary, the analysis of reliability based on 
just one attribute. 


Evaluation of reliability: Assessing reliability is of real 
interest when opportunistic sources are considered because 
the analyst has neither an indication of how the source might 
behave nor the ability to monitor or control either the human 
providing the information or the environment in which the 
source operates. Various methods can be developed to estimate 
competence, reputation and intentions of the source. For 
example, competence is closely related to the level of training 
of an observer or can be defined by domain knowledge. Values 
can be expressed either in a linguistic form (bad, good, fair, 
unknown) or by a number. Reputation is an attribute which can 
be constructed not just by examining previous failures of the 
source but also by considering its level of conflict with other 
sources; this too can be expressed by numeric or symbolic 
values. 

While reputation and competence can be, at least in some 
cases, estimated from prior knowledge, characterizing the 
intentions of a source is subject to human perception and anal- 
ysis. Judgment of human experts is needed not just because 
there usually is no a priori characterization of the source with 
respect to its intentions but also because it is important to 
assess those aspects from the subjective point of view of an 
expert in the form of binary values only. 

From a practical standpoint, it is suitable to provide an 
expert with a description of source competence, reputation and 
intentions as assessed independently. This way, experts can 
have the opportunity to develop different strategies of using 
reliability: they can decide to assign different importance to 
those attributes under different contexts or can use their own 
hierarchy of attributes. For instance, an expert may consider 
as irrelevant the information provided by a source whose 
competences is lower than a specific threshold or if he suspects 
the source of having malicious intentions. 


Definition of a self-confidence function: A self-confidence 
function is a mapping linking a real value and an information 
item. The real value is a measure of the information credibility 
as evaluated by the sensor itself and is of particular interest 
for human sources, as often such sources provide their own 
assessments of the information conveyed. Identifying features 
of self-confidence requires methods related to a research task 
of natural language processing: the identification of assertions 
and opinions in texts. In this field, the commonly adopted 
separation of those notions considers assertions as statements 
that can be proven true or false, while opinions are hypotheses, 
assumptions and theories based on someone’s thoughts and 
feelings and cannot be proven. 


Evaluation of self-confidence: The estimation of self- 
confidence aims at assigning a numerical value which cap- 
tures how strongly the author stands behind assertions in the 
statement, on the basis of lexical clues he has included in the 
utterance. More generally, markers of an author’s commitment 
are in the form of hedges, modal verbs and forms of passive/ 


active language. A hedge is a mitigating word that modifies 
the commitment to the truth of propositions, i.e., certainly, 
possibly. Its impact can be magnified by a booster (highly 
likely) or weakened by a downtoner (rather certain). 


Modal verbs indicate if something is plausible, possible, 
or certain (John could be a terrorist, you might be wrong). 
Moreover, in some domains sentences making use of the pas- 
sive voice are considered as an indicator of uncertainty, in the 
sense that author seeks to distance himself from the assertions 
in the items reported through use of passive voice. Quantifying 
self-confidence is a topic of particular interest for intelligence 
analysis, and it was early addressed by Kent in 1962, [56] who 
created a standardized list of words of estimative probability 
which were widely used by intelligence analysts. This list has 
continued to be a common basis to be used by analysts to 
produce uncertainty assessments. Kesselman describes in [57] 
a study conducted to analyze the way the list was used by 
analysts over the past, and identifies new trends to convey 
estimations and proposes a new list having the verb as a 
central element. Given the variety of linguistic markers for 
uncertainty, the estimation of a numerical value based on 
every possible combination seems unrealistic, as the same 
sentence oftencontains not just one but multiple expressions 
of uncertainty. Additionally, assigning numerical values to 
lexical expressions is not an intuitive task, and Rein shows 
that there are no universal values to be associated in a unique 
manner to hedges or other uncertainty markers, see [58]. As 
the author argues further, it is, however, possible to order those 
expressions and use this relative ordering as a more robust way 
to compare combinations of uncertainty expressions, and thus 
highlight different levels of uncertainty in natural language 
statements. 


Using the model for trust analysis: The model proposed in 
this work proposed in this work combines various attributes 
of the source (discussed previously under “reliability”) with 
“self-confidence” in order to capture trust of information as 
conveyed by the human. The model is source-centric pre- 
dominantly focused on the source’s ability to correct, alter or 
qualify the information report Although the rules for ranking, 
prioritizing and combining the attributes introduced by the 
model can be drafted empirically, the estimation of a trust 
value requires a formal representation of the model. 


A possible solution for estimating a unified value for 
trust is to consider reliability and self-confidence within the 
framework of an uncertainty theory and to rely on the set 
of combination rules the theory defines - for example, those 
developed in probability theory, in possibility theory, or in 
belief functions theory. All these theories provide various 
operators to combine reliability and self-confidence in order 
to estimate trust. 


In the following the model is represented by using belief 
functions and several scenarios are used to illustrate trust 
estimation. 
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V. TRUST FORMALIZATION WITH BELIEF FUNCTIONS 


The aim of trust formalization is to provide a formal repre- 
sentation of the model, combining the capability to exploit the 
structure and relationship of elements of the model with the 
ability to express degrees of uncertainty about those elements. 
Of particular interest to this paper is the observation that the 
developed model introduces a cognitive view of trust as a com- 
plex structure of beliefs that are influenced by the individual’s 
opinions about certain features and elements, including their 
own stances. Such a structure of beliefs determines various 
degrees of trust, which are based on personal choices made 
by analyst, on the one hand, and the source, on the other hand. 
Therefore, the formalization requires a formalism that is more 
general than probability measures or fuzzy category represen- 
tation, which are more suitable for applications considering 
trust in the context of interactions between agents. Moreover, 
the limitations of using subjective probabilities to formalize 
trust from this cognitive standpoint are clearly stated in [13]. 
As a result, the model was represented with belief functions, a 
formalism that is consistent with the cognitive perspective of 
trust adopted by the model. This belief-based representation 
provides the most direct correspondence with elements of the 
model and their underlying uncertainty, while being able to 
quantify subjective judgments. 

After introducing main concepts of belief functions, this 
section shows how the formalism is used to represent the trust 
model. 


A. Basic Belief Assignment 


Belief Functions (BF) have been introduced by Shafer in 
his his mathematical theory of evidence [59], also referred 
to Dempster-Shafer Theory (DST), to model epistemic un- 
certainty. The frame of discernment (FoD) of the decision 
problem under consideration, denoted ©, is a finite set of 
exhaustive and mutually exclusive elements. The powerset of 
© denoted 2° is the set of all subsets of ©, empty set included. 
A body of evidence is a source of information characterized by 
a Basic Belief Assignment (BBA), or a mass function,which 
is the mapping m(.) : 2° — [0,1] that satisfies m(@) = 0, 
and the normalization condition }) 4-26 m(A) = 1. The be- 
lief (a.k.a credibility) Bel(.) and plausibility Pl(.) function,s 
usually interpreted as lower and upper bounds of unknown 
(subjective) probability measure P(.), are defined from m(.) 
respectively by 


Bel(A) = m(B), (1) 
BCA|Be2° 

PY(A)= > m(B). (2) 
BNA#O| BE2e 


An element A € 2° is called a focal element of the BBA 
m/(.), if and only if m(A) > 0. The set of all focal elements 
of m(.) is called the core of m/(.) and is denoted K(m). This 
formalism allows for modeling a completely ignorant source 


by taking m(©) = 1. The Belief Interval (BI) of any element 
A of 2° is defined by 


BI(A) 4 [Bel(A), P1(A)]. (3) 


The width of belief interval of A, denoted U(A) = PI(A) — 
Bel(A) characterizes the degree of imprecision of the un- 
known probability P(A), often called the uncertainty of A. 
We define the uncertainty (or imprecision) index by 


U(m) = S° U(A), (4) 


Ace 


to characterize the overall imprecision of the subjective 
(unknown) probabilities committed to elements of the FoD 
bounded by the belief intervals computed with the BBA m/(.). 


Shafer proposed using Dempster’s rule of combination for 
combining multiple independent sources of evidence [59] 
which is the normalized conjunctive fusion rule. This rule has 
been strongly disputed in the BF community after Zadeh’s 
first criticism in 1979, and since the 1990s many rules have 
been proposed to combine (more or less efficiently) BBAs; the 
reader is advised to see discussions in [60], in particular the 
proportional conflict redistribution rule number 6 (PCR6). To 
combine the BBAs we use the proportional conflict redistribu- 
tion (PCR) rule number 6 (denoted PCR6) proposed by Martin 
and Osswald in [60] because it provides better fusion results 
than Dempster’s rule in situations characterized by both high 
and low conflict as explained in detail in [61], [62]. 

The PCR6 rule is based on the PCR principle which 
transfers the conflicting mass only to the elements involved 
in the conflict and proportionally to their individual masses, 
so that the specificity of the information is entirely preserved. 
The steps in applying the PCR6 rule are: 


1) apply the conjunctive rule; 

2) calculate the total or partial conflicting masses; and 

3) redistribute the (total or partial) conflicting mass propor- 
tionally on non-empty sets. 


The general PCR6 formula for the combination of n > 2 
BBAS is very complicated (see [60] Vol. 2, Chap. 2). For 
convenience’s sake, we give here just the PCR6 formula for 
the combination of only two BBAs. When we consider two 
BBAs mj (.) and m2(.) defined on the same FoD 0, the PCR6 
fusion of these two BBAs is expressed as mpc re (0) = 0 and 
for all X 40 in 2° 


mporo(X)= > mi(X1)m2(X2)+ 
ra (X)?malV) m(X)?m1(Y) 
(or oe! SOO 
veo m,(X) + m2(Y) m(X) + my (Y) 
XNY=0 


where all denominators in (5) are different from zero. If a 
denominator is zero, that fraction is discarded. A very basic 
(not optimized) Matlab code implementing the PCR6 rule can 
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be found in [60] and [63], and also in the toolboxes repository 
on the web’. 

Instead of working with quantitative (numerical) BBA, it 
is also possible to work with qualitative BBA expressed by 
labels using the linear algebra of refined labels proposed in 
Dezert-Smarandache Theory (DSmT), [60] (Vol. 2 & 3). 


B. Trust formalization model 


Because beliefs are well defined mathematical concepts in 
the theory of belief functions, we prefer to use self-confidence 
terminology to represent the confidence declared by a source 
Y on its own assertion A. Let’s denote by A the assertion 
given by the source, for instance A = John is a terrorist. With 
respect to elements of the model, A (the assertion) corresponds 
to iq, the assertive part of the item J and v(A) is a numeric 
estimation of the subjective 7; component of J . 

The valuation v(A) made by the source Y about the 
assertion A can be done either quantitatively (by a probability 
or a BBA) or qualitatively (by a label associated to a linguistic 
form). This paper considers quantitative representation of v(A) 
for simplicity>. 

The basic information items provided by a source Y consists 
of A (the assertion), and v(A) (its valuation). To be as general 
as possible, we suppose that v(A) is a basic belief mass 
assignment defined with respect to the very basic frame of 
discernment 04 = {A, A} where A denotes the complement 
of A in Q,, that is v(A) = (m(A), m(A),m(A U A)). Note 
that only two values of the triplet are really necessary to define 
v(A) because the third one is automatically derived from the 
normalization condition m(A)+m(A)+m(AUA) = 1. So one 
could also have chosen equivalently v(A) = [Bel(A), PI(A)] 
instead of the BBA. In a probabilistic context, one will take 
m(AU A) = 0 and so v(A) = P(A) because Bel(A) = 
PI(A) = P(A) in such a case. 

The self-confidence of the source Y is an extra factor ay € 
(0, 1] which characterizes the self-estimation of the quality of 
the piece of information (A,v(A)) provided by the source 
itself. ay = 1 means that the source Y is 100% confident in 
his valuation v(A) about assertion A, and ay = 0 means that 
the source Y is not at all confident in his valuation v(A). In the 
theory of belief functions, this factor is often referred as the 
discounting factor of the source because this factor is usually 
used to discount the original piece of information (A, v(A)) 
into a discounted one (A, v’(A)) as follows [59]: 


m’(A) = ay -m(A), (6) 
m'(A) = ay -m(A), (7) 
m'(AU A) = ay -m(AU A) +4 (1- ay). (8) 


The idea of Shafer’s discounting technique is to diminish 
the belief mass of all focal elements with the factor ay and 
redistribute the missing discounted mass (1—«ay-) to the whole 


*http://bfaswiki.iut-lannion.fr/wiki/index.php/Main_Page 

3 Without loss of generality one can always map a qualitative representation 
to a quantitative one by a proper choice of scaling and normalization (if 
necessary). 


ignorance A U A. Note that the valuation of the discounted 
piece of information is always degraded because its uncertainty 
index is always greater than the original one, that is, U(m’) > 
U(m), which is normal. 

The reliability factor r estimated by the analyst X on 
the piece of information (A,v(A)) provided by the source 
Y must take into account both the competence Cy, the 
reputation Ry and the intention Jy of the source Y. A simple 
model to establish the reliability factor r is to consider that 
Cy, Ry and Iy factors are represented by numbers [0, 1] 
associated to select subjective probabilities, that is Cy = 
P(Y is competent), Ry = P(Y has a good reputation) and 
Ry = P(Y has a good intention (i.e. is fair)). If each of 
these factors has equal weight, then one could use r = 
Cy x Ry x Ty as a simple product of probabilities. However, 
in practice, such simple modeling does not fit well with 
what the analyst really needs to take into account epistemic 
uncertainties in Competence, Reputation and Intention. In fact, 
each of these factors can be viewed as a specific criterion 
influencing the level of the global reliability factor r. This is 
a multi-criteria valuation problem. Here we propose a method 
to solve the problem. 

We consider the three criteria Cy, Ry and ly with 
their associated importance weights wc, wr, wy in (0, 1] 
with wo+wrRr+wy=1. We consider the frame of dis- 
cernment 0, = {r,7} about the reliability of the source 
Y, where r means that the source Y is reliable, and 7 
means that the source Y is definitely not reliable. Each 
criteria provides a valuation on r expressed by a correspond- 
ing BBA. Hence, for the competence criteria Cy, one has 
(me(r),mo(F),me(r UF)), while for the reputation criteria 
Ry, one has (mr(r), mR(7), MR(rUF)) and for the intention 
criteria Jy, one has (my(r), mr(7), mr(r UF)). 

To get the final valuation of the reliability r of the source 
Y, one needs to efficiently fuse the three BBAs mc(.), 
mp(.) and m;(.), taking into account their importance weights 
wc, WR, and wy;. This fusion problem can be solved by 
applying the importance discounting approach combined with 
PCR6 fusion rule of DSmT [63] to get the resultant valuation 
v(r) = (mpcre(r), Mecre(*), MPcrRe(T U r)) from which 
the decision (r, or 7) can be drawn (using BI distance, for 
instance). If a firm decision is not required, an approximate 
probability P(r) can also be inferred with some lossy trans- 
formations of BBA to probability measure [60]. Note that 
Dempster’s rule of combination cannot be used here because it 
does not respond to the importance discounting, as explained 
in [63]. 

The trust model consists of the piece of information 
(A, v(A)) and the self-confidence factor ay provided by the 
source Y, as well as the reliability valuation u(r) expressed by 
the BBA (m(r),m(7),m(r UF)) to infer the trust valuation 
about the assertion A. For this, we propose using the mass 
m(r) of reliability hypothesis r of the source Y as a new 
discounting factor for the BBA m/(.) reported by the source 
Y, taking into account its self-confidence ay. Hence, the trust 
valuation v;(A) = (m;(A),mz(A),m:(A U A)) of assertion 
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A for the analyst X is defined by 


my(A) = m(r) -m'(A), (9) 
m:(A) = m(r)-m’(A), (10) 
m:(AU A) = m(r)-m'(AU A) + (1 — m(r)), (11) 

or equivalently by 
m4(A) = m(r)ay -m(A), (12) 
m(A) = m(r)ay - m(A), (13) 


m(AU A) = m(r)ay -m(AU A) + (1—m(r)ay). (14) 


The DSmT framework using the PCR6 fusion rule and 
the importance discounting technique provides an interesting 
solution for the fusion of attributes having different degrees 
of importance while making a clear distinction between those 
attributes. 

The discounting method proposed in this work is directly 
inspired by Shafer’s classical discounting approach [59]. In our 
application, the classical discounting factor that we propose 
integrates both the mass of reliability hypothesis m(r) and 
the self-confidence factor ay. It is worth noting that more 
sophisticated (contextual) belief discounting techniques [64] 
exist and they could also have been used, in theory, to refine 
the discounting but these techniques are much more compli- 
cated and they require additional computations. The evaluation 
of contextual belief discounting techniques for such types of 
application is left for further investigations and research works. 


VI. UNCERTAINTY ANALYSIS UNDER URREF CRITERIA 


Tracking uncertainties from problem description to model 
construction and formalization is done under criteria of the un- 
certainty representation and reasoning evaluation framework. 

The goal of URREF is to place the focus on the evaluation 
of uncertainty representation and reasoning procedures. The 
URREF ontology defines four main classes of evaluation 
criteria: Data Handling, Representation, Reasoning and Data 
Quality. These criteria make distinctions between the evalu- 
ation of the fusion system, the evaluation of its inputs and 
outputs, and the evaluation of the uncertainty representation 
and reasoning aspects. 

Listing all criteria is an extensive task and in this paper the 
authors will provide one piece of the puzzle by considering 
criteria that relate to the evaluation of uncertainty induced by 
the proposed model. In the model developed in this paper, 
uncertainty is due to imperfections of information gathering 
and reporting as well as constraints of the representation 
formalism. 

Uncertainty analysis is carried out by assigning uncertainty 
criteria to elements and functions of the trust model in order 
to make explicit the uncertainty arising when the problem is 
abstracted by the model and the model is then simplified in 
order to fulfill constraints of specific formalism, fig. 6. 

The URREF criteria selected are subclasses of two main 
concepts: Credibility, a subconcept under DataCriteria, and 
EvidenceHandling, a subconcept of RepresentationCriteria. 


Self Confidence 


Trust T(I, S) 
+> 


Source Analyst 


Reliabilit 
Ss , i 


Fig. 4. Trust estimation from source to analyst. 


To summarize, uncertainties of the model will be captured 
by the following URREF criteria : 


e Objectivity, subconcept of Credibility: indicates a 
source providing unbiased information; 

e ObservationalSensitivity, subconcept of Credibility: 
characterizes the skills and competences of sources; 

e SelfConfidence, subconcept of Credibility: measures the 
certainty degree about the piece of information reported, 
according to the source; 

e Ambiguity, subconcept of EvidenceHandling: captures 
if the sources provide data supporting different conclu- 
sions; 

e Dissonance, subconcept of EvidenceHandling: captures 
the ability of formalism to represent inconsistent evi- 
dence; 

e Completeness, subconcept of EvidenceHandling: is a 
measure of how much is known given the amount of 
evidence; and 

e Conclusiveness, subconcept of EvidenceHandling: indi- 
cates how strong the evidence supports a conclusion; 


Besides selecting uncertainty criteria relevant for trust es- 
timation, the analysis also discusses the mapping of URREF 
criteria to attributes of the model and sheds a light on imperfect 
matchings. This mapping offers a basis for identifying the 
limitations of the URREF ontology, by emphasizing those 
elements whose characterizations in terms of uncertainty are 
out of the ontology’s reach or beyond the ontology’s intended 
scope. 


A. Uncertainties from problem definition to model abstraction 


Let MM be the model for trust estimation, with elements 
introduced in paragraph IV: the source Y, the reported item 
I with its assertive 7, and subjective i, parts ,and x(J) the 
confidence level assigned by the source Y to I. 

From an information fusion standpoint, inputs of the model 
are the source and the information items, along with their 
uncertainty, captured with the following URREF criteria: Ob- 
jectivity, ObservationalSensitivity and SelfConfidence. These 
criteria are subclasses of the concept InputCriteria. 

Objectivity is an attribute of the source, related to its 
ability to provide factual, unbiased items, without adding their 
own points of view or opinions. For a source Y providing 
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information item 7, having i, and 7 as the subjective and 
factual parts respectively, objectivity can be expressed as: 


Objectivity(Y, I) = Wol(ts, ta), (15) 


where w,(75, ia) represents the mathematically quantified ex- 
pression of the subjective over the factual content of 7. 
ObservationalSensitivity is an attribute of the source which 
represents the source’s ability to provide accurate reports. 
In the proposed model, this criterion is an aggregation of 
competence C’ and reputation R, two attributes of the model. 


Observational Sensitivity(Y, 7) = Wos(C, R), (16) 


where wo5(C, R) is a function aggregating values of compe- 
tence and reputation. 

Information items entering the system are described by 
SelfConfidence. Again, considering 7, and 7, as the subjective 
and factual items conveyed by I, SelfConfidence can be 


expressed as: 
Sel fConfidence(I) = wsc(is), (17) 


with w.-(¢;) a function quantifying the subjective content of 
item J. 


Reliability URREF : Input criteria 
Credibility 
Intention 
Competence 
Reputation 


Objectivity 
ObservationalSentivity 


Ambiguity 


SelfConfidence 


SelfConfidence 


¢ 1 


Fig. 5. Mapping of model attributes to URREF criteria. 


Fig. 5 shows the mapping between the elements of the 
model and the set of relevant URREF uncertainty criteria. The 
mapping shows a perfect match between SelfConfidence as 
introduced by the model and the eponymous URREF criterion 
as well as several imperfect matches described later in this 
paper. 

At source level, URREF criteria are not able to capture 
in a distinct manner the features of competence, reputation 
and intentions, the main attributes of the sources added by 
the model under Reliability. To some extent, competence 
and reputation can be related to ObservationalSensitivity, but 
intentions clearly remains out of reach for URREF criteria. 


B. Uncertainties from model to formal representation 


Let F' be the DST formalization of the trust estimation 
model, with parameters introduced in paragraph V. The for- 
malism induces two types of uncertainty related to its capacity 
to handle incomplete, ambiguous or contradictory evidence. 
The uncertainty of evidence handling is captured by Ambi- 
guity, Dissonance, Conclusiveness and Completeness. Those 
criteria are subclasses of the concept EvidenceHandling. 


Ambiguity measures the extent to which the formalism can 
handle data sets which support different conclusions. 


Ambiguity(F) = ¢alay, Ry), (18) 


where the function ¢a(ay, Ry) considers the self-confidence 
factor ay provided by the source Y and the reliability of 
Y provided by the analyst Ry to estimate the degree of 
ambiguity. The measure is of particular interest in the case 
where items having high values of self-confidence are provided 
by unreliable sources. 

Dissonance captures the ability of the formalism to rep- 
resent inconsistent evidence. For BBA representations, disso- 
nance can be related to the capacity of the formalism to assign 
belief mass to an element and its negation, and can therefore 
be assessed for every BBA representation build for the model. 
For example, the dissonance for a source’s competence can be 
in the form: 


Dissonance(F) = ¢a(mc(r),mea(F)), (19) 


where da(mc(r),mc(F)) is a function combining the belief 
mass assigned to whether the source is considered to be 
competent or incompetent, respectively. 

Dissonance is useful for highlighting situations in which 
there are significant differences in belief masses assigned at 
the attribute level, such as when a source is considered to 
be incompetent (low mc(r), highmc(r)) but has a good 
reputation (high mp(r), low mpR(7)). 

Conclusiveness is a measure expressing how strongly the 
evidence supports a specific conclusion or unique hypothesis: 


Conc.(F) = bce(m:(A), m:(A),m:(A U A)), (20) 


where @ee(mz(A), mz(A), m:(AUA)) is a function combining 
the belief masses estimated for truthful, untruthful and un- 
known qualifications of assertion A respectively. This measure 
indicates to which extent the result of inferences can support 
a conclusion, in this case whether the hypothesis that the 
assertion under analysis is trustworthy or not. It can be used 
during the inference process to show how taking into account 
additional elements such as the competence of the source, its 
reputation or intentions impact the partial estimations of trust. 

Completeness is a measures of the range of the available 
evidence, and captures the ability of formalism to take into 
account how much is unknown. The measures is somewhat 
similar to Dissonance, as is can be assessed for every BBA 
representation build for the model. Thus, completeness of 
source’s reliability is described as: 


Completeness(F) = dep(mcr UF)), (21) 


where ¢-p(mcr U7)) is a function depending on the belief 
mass assigned to unknown. 

The measure is used for estimation and analysis before 
entering the fusion process, in order to have a picture of how 
complete the evidence describing the various elements of the 
model is, and to avoid performing fusion on highly incomplete 
data sets. Both EvidenceHandling and KnowledgeHandling are 
subclasses of RepresentationCriteria. 
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URREF : Representation 
EvidenceHandling 


Formalism DST 
Dempster-Shafer 


Bel(A)= > m(B) 
BCA|Be2® 
Basic Belief Dissonance 
assignment 
PCR6 Rule 


Discounting 


P(A)= > m(B) 


BNA#O|BE2® 


Conclusiveness 


Completeness 


Fig. 6. Mapping of formalism uncertainties to URREF criteria. 


This section has analyzed the nature of uncertainties arising 
when going from problem to model definition and then on to 
formalization with belief functions. The next section shows 
how uncertainties can be highlighted for particular scenarios 
of trust estimation. 


VII. UNCERTAINTY ANALYSIS FOR TRUST ESTIMATION 


A. Running example and method for uncertainty tracking 
As a running example, let’s consider an assertion A and its 
valuation v(A) provided by the source Y as follows: m(A) = 
0.7, m(A) = 0.1 and m(AU A) = 0.2. Its self-confidence 
factor is ay = 0.75. Hence, the discounted BBA m’(.) is 
given by 
m’(A) = 0.75 - 0.7 = 0.525 
m!(A) = 0.75 - 0.1 = 0.075 
m'(AU A) =1—m’(A)—m'(A) =0.4 
Let’s assume that the BBAs about the reliability of the 


source based on Competence, Reputation and Intention criteria 
are given as follows: 


mce(r) = 0.8, me(7) = 0.1, mce(r UF) = 0.1, 
mr(r) = 0.7,mpR(7) = 0.1, mR(r UF) = 0.2, 
mr(r) = 0.6, m7(7) = 0.3, m7(r UF) = 0.1, 


with importance weights wy = 0.6, we = 0.2 and wo = 0.2. 

After applying the importance discounting technique pre- 
sented in [63] which consists of discounting the BBAs with the 
importance factor and redistributing the missing mass onto the 
empty set, then combining the discounted BBAs with PCR6 
fusion rule, we finally get, after normalization, the following 
BBA 


m(r) = 0.9335, 
m(r) = 0.0415, 
m(r UF) =1—m(r) — m(Fr) = 0.025. 

The final trust valuation of assertion A reported by the 
source Y taking into account its self-confidence ay = 0.75 
and the reliability factor m(r) = is therefore given by Eqs. 
(12)-(14) and obtaining 

m,(A) = 0.4901, 
m(A) = 0.0700, 
m(AU A) = 1—m,(A) — m,(A) = 0.4399. 
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Note that if mc(r) = mpr(r) = mr(r) = 1, then we will 
always get m(r) = 1 regardless of the choice of weightings 
factors, which is normal. If there is a total conflict between 
valuations of reliability based on Competence, Reputation and 
Intention criteria, then Dempster’s rule cannot be applied to 
get the global reliability factor m(r) because of 0/0 inde- 
terminacy in the formula of Dempster’s rule. For instance, 
if one has mc(r) = mpr(r) = 1 and m;(7) = 1, then 
m(r) is indeterminate with Dempster’s rule of combination, 
whereas it corresponds to the average value m(r) = 2/3 
using PCR6 fusion rule (assuming equal importance weights 
wo = WR = wy = 1/3), which makes more sense. 

The following subsections explore several scenarios for 
trust assessment, corresponding to different situations of BBAs 
distributions, and track the uncertainty according to URREF 
criteria. Each scenario illustrates specific instances of the 
model developed for trust estimation. 

The method adopted to track uncertainty defines the follow- 
ing measures to estimate URREF criteria: 


Sel fCon fidence = ay, 
Ambiguity = |ay — m(r)|, 
Objectivity = myz(r), 
ObservationalSensitivity = min(mc(r),mrR(r)). 


As shown in previous formulas, URREF criteria are es- 
timated based on features of the BBA formalization and 
are assigned to the static elements of the model, i.e., the 
source and the information item. While Objectivity and 
ObservationalSensitivity captures imperfections of obser- 
vations, SelfConfidence and Ambiguity reflect inaccura- 
cies in reporting information to analysts. These criteria are 
assessed before entering the fusion phase, and describe the 
initial uncertainty present in the system before inferences. 

In addition, Dissonance, Conclusiveness and 
Completeness will be estimated at the scenario level 
by adopting the following formulas: 


Dissonance = 1 — |m,(A) — m:(A)], 


Conclusiveness = |m,(A) — m:(A)|, 
Completeness = 1— m(AU A). 


Criteria above will be assessed for elements impacted by the 
fusion process: the reliability of the source, the updated BBAs 
of the initial assertion and estimated trust. In the following 
subsection we illustrate several scenarios for trust estimation 
and the uncertainty analysis underlying each scenario. 


B. Scenarios for trust assessment and uncertainty analysis 


Scenarios introduced below provide examples of trust con- 
struction using various operators and highlight the uncertainty 
assigned to elements of the model and its propagation during 
the fusion process. 


Scenario 1 - Consensus: Suppose that Y provides the 
assertion A, while stating that A certainly holds and that X 
considers Y to be a reliable source. 

In this case, the trust will be constructed on the basis of 
two consensual opinions: the analyst X that considers Y as a 
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reliable source, and the source’s conviction that the informa- 
tion provided is certain. In this case, m(A) = 1, ay = 1 and 
m(r) = 1, so that m’(A) = 1 and m:(A) = m(r)-m’(A) = 1. 
The result will be in the form (A, v(A)) initially provided by 
the source. 


Uncertainity of inputs 


2*Observation | Objectivity 
ObservationalSensitivity 


1 
1 
2*Reporting SelfConfidence 1 
Ambiguity 0 
TABLE I 
CONSENSUS: INPUT UNCERTAINTY. 


Updated BBAs 


omplet. 


Reliability lo. 
TABLE II 
CONSENSUS: FUSION UNCERTAINTY. 


This scenario illustrates an ideal situation for trust assess- 
ment, where the source is trustworthy and well known to the 
analyst, and observations are reported in perfect conditions. As 
shown in table I, there is no uncertainty induced by the source, 
and once fusion is performed the items impacted show high 
values for conclusiveness and completeness, while dissonance 
is 0 for the updates BBAs for values, source’s reliability and 
estimated trust, as shown in table II. 


Scenario 2 - Uncertain utterances: Y is considered by 
X to be a reliable source and reports the assertion A, while 
showing a low level of certainty v(A) about the veracity of A. 
This example is relevant for situations where a reliable source 
provides (possibly) inaccurate descriptions of events due to, 
say, bad conditions for observation. This scenario corresponds 
by example to the following case for inputs: ay = 0.6 


m(A) = 0.8,m(A) = 0.1,m(AU A) = 0.1, 
mo(r) = 0.9, ma(F) = 0,me(r UF) = 0.1, 
mpr(r) = 0.9, mR(7) = 0, mR(r UF) = 0.1, 
my(r) = 0.3, m7(7) = 0.3, m7(r UF) = 0.6, 


and wo = 0.5, wr = 0.5 and w; = 0. 
This results in 


m!(A) = 0.48, m'(A) = 0.06, m'(A U A) = 0.46, 
and 
m(r) = 0.9846, m(7) = 0, m(r Ur) = 0.0154. 
Therefore, one finally obtains the trust valuation 
m4(A) = 0.47, m;(A) = 0.05, m:(A U A) = 0.46. 


This case shows that self-confidence has an important im- 
pact on the values of discounted BBA, as m/(A) is decreased 


from 0.8 to 0.48, and thus the remaining mass is redistributed 
on m’(AU A). 

The combination of competence, reliability and intention 
are in line with the assumption of the scenario, which states 
that Y is a reliable source. After normalization, values for 
trust assessment clearly highlight the impact of uncertain 
utterances, as the BBA shows a mass transfer from m;(A) to 
m:(AU A). Still, values of trust are close to BBA integrating 
the self-confidence, which confirms the intuition that when the 
analyst X considers Y to be a reliable source, the assertion 
A is accepted with an overall trust level almost equal to the 
certainty level stated by the source. 


Uncertainty of inputs 


2*Reporting SelfConfidence 0.6 
Ambiguity 0.38 


TABLE III 
UNCERTAIN UTTERING: INPUT UNCERTAINTY. 


Updates BBAs 


Reliability 
Tut Sid 


Trust 


TABLE IV 
UNCERTAIN UTTERANCE: FUSION UNCERTAINTY. 


This scenario illustrates uncertainty induced by observations 
failures, as Objectivity, and Sel fConfidence are low, see 
table III. 

While the quality of the source is highlighted by high 
values of Conclusiveness and Completeness, showing the 
analyst’s confidence in the reports analyzed, the impact of im- 
perfect observation is shown in the overall estimation of trust, 
through a combination of Dissonance, Conclusiveness and 
Completeness which have values close to 0.5, see table IV. 


Scenario 3 - Reputation: Suppose that Y provides A 
and v(A) and X has no global description of Y in terms 
of reliability. As the reliability of Y is not available, Y’s 
reputation will be used instead, as derived from historical data 
and previous failures. This scenario corresponds by example 
to the following case for inputs: ay = 1 


m(A) = 0.8,m(A) = 0.1,m(AU A) = 0.1, 
mo(r) = 0.1, me(F) = 0.1, me(r UF) = 0.8, 
mr(r) = 0.9,mpR(7) = 0.1, mR(r Ur) = 0, 
my(r) = 0.1, mz(F) = 0.1, mi(r UF) = 0.8, 


and wo = 0.1, wr = 0.8 and w; = 0.1. 
Hence, one gets 


m! (A) = 0.8, m'(A) = 0.1, m’(AU A) = 0.1, 
and 


m(r) = 0.94, m(7) = 0.01, m(r UF) = 0.03. 
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Therefore, one finally obtains the trust valuation 
m(A) = 0.75, m:(A) = 0.09, m,(A U A) = 0.14. 


For this scenario, the source is confident about their own 
assertions, and therefore 


and 
m!(A) = 0.8, m'(A) = 0.1,m/(AU A) =0.1 


have identical BBA distributions. The reliability of the source 
is built namely on its reputation, as there are clues about the 
competence and intentions of the source. Hence, the overall 
BBA 


m(r) = 0.9449, m(7) = 0.0196, m(r Ur) = 0.0355 
is close to the initial reputation distribution 


mr(r) = 0.9,mpR(F) = 0.1, mR(r UF) = 0 


Values of trust show the impact of using not completely 
reliable sources, which decreased the certainty level of the 
initial BBA 


m'(A) = 0.8, m/(A) =0.1,m'(AU A) —0.1 
to 
m,(A) = 0.75, m:(A) = 0.09, m:(AU A) = 0.14 


They also support the intution that the trust assigned by the 
analyst to A will have an upper limit equal to the reputation 
of the source. 


Uncertainty of inputs 


2*Observation | Objectivity 0.10 
ObservationalSensitivity | 0.10 


2*Reporting SelfConfidence 1 
Ambiguity 0.60 


TABLE V 
REPUTATION: INPUT UNCERTAINTY. 


Updated BBAs 0.30 


0.95 


Trust 0.84 


Reliability 
PTust Sid 


Oe 
TABLE VI 
REPUTATION: FUSION UNCERTAINTY. 


This scenario is similar the previous one as, in both cases, 
there are incomplete descriptions of the source. For this 
particular case, a historical recording of source’s failures offers 
a basis to overcome the missing pieces and, in spite of low 
values for Objectivity and ObservationalSensitivity (see 
table V), the final trust evaluation is improved with respect 
to the previous scenario and shows a better combination of 
Dissonance, Conclusiveness and Completeness, as shown 
in table VI. 


Scenario 4 - Misleading report: In this case, Y provides 
the assertion A, while stating that it certainly holds and X 
considers Y to be a completely unreliable source. For this 
case, the analyst knows that the report is somehow inaccurate, 
for example, it cannot be corroborated or it contradicts, at least 
in part. information from other (more reliable) sources. The 
analyst suspects the source of having misleading intentions, 
and can therefore assign a maximal uncertainty level to the 
information reported. This scenario corresponds by example 
to the following case for inputs: ay = 1 


m(A) =1,m(A) =0,m(AU A) =0, 
me(r) = 0.1, me(F) = 0.1, mco(r UF) = 0.8, 
mr(r) = 0.1, mR(7) = 0.1, mR(r UF) = 0.8, 
mr(r) = 0.1, m71(7) = 0.8, mr(r UF) = 0.1, 
and wo = 0.1, we = 0.1 and wy; = 0.8,. Hence, one gets 
m!(A) = 1,m’(A) =0,m'(AU A) = 0, 
and 


m(r) = 0.02, m(7) = 0.91, m(r UF) = 0.06. 


Therefore, one finally obtains as trust valuation 
m:(A) = 0.023, m:(A) = 0, m;,(A U A) = 0.976. 


The values for this scenario reflect the high self-confidence 
of the source and high accuracy of the assertion provided; 
therefore, the initial BBA is unchanged after fusion with self- 
confidence. Nevertheless, the impact of having misleading 
intention is visible first on the mass distribution assigned to 
reliability and then on the overall values of trust. With respect 
to the initial values 


m(A) = 1,m(A) =0,m(AU A) =0, 
and the partially fused ones 
m'(A) = 1,m'(A) = 0,m'(AU A) =0, 
the integration of a misleading source transfers the mass 
assignation almost exclusively to m,(A U A). 
Intuitively, the assertion A will be ignored, as the reliability 


of the source is dramatically decreased by a high mass 
assignment on misleading intentions. 


Uncertainty of inputs 


2*Observation | Objectivity 0.10 
ObservationalSensitivity | 0.10 


2*Reporting SelfConfidence 1.00 
Ambiguity 0.97 


TABLE VII 
MISLEADING REPORT: INPUT UNCERTAINTY. 


This scenario illustrates the impact of misleading sources 
on trust estimation. Hence, the use case has very good values 
for reporting induced uncertainty, with high Sel fCon fidence 
and low Ambiguity (see table VII), but the overall trust 
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0 


076 
TABLE VIII 
MISLEADING: FUSION UNCERTAINTY. 


characterization shows strong Dissonance, corroborated with 
low Conclusiveness and near zero Completeness, as shown 
in table VIII. 


Scenario 5 - Ambiguous report: The source Y provides 
A and v(A), the uncertainty level. Suppose that v(A) has a 
low value, as the source is not very sure about the events 
reported, and that X considers Y to be unreliable. This 
scenario corresponds by example to the following case for 
inputs: ay = 0.3 


m(A) = 0.6, m(A) = 0.2,m(AU A) = 0.2, 
mo(r) = 0.1, m¢(F) = 0.8, me(r UF) = 0.1, 
mpr(r) = 0.1, mR(F) = 0.8, mR(r UF) = 0.1, 
my(r) = 0.1, m7(7) = 0.1, m7(r UF) = 0.8, 


and wo = 0.2, wr = 0.4 and w; = 0.4. 
Hence, one gets 


m!(A) = 0.18, m!(A) = 0.06, m'(A U A) = 0.76, 
and 
m(r) = 0.02, m(7) = 0.43, m(r UF) = 0.53. 
Therefore, one finally obtains the trust valuation 
mz(A) = 0.0040, m,(A) = 0.0013, 


and 


m(AU.A) = 0.9946. 


This scenario is an illustration for the worst practical case and 
is relevant when the analyst receives a report provided by a 
source that lacks the skills or competence to provide accurate 
descriptions of events. In this case, the reports are incomplete, 
ambiguous, or even irrelevant. In addition to low competence 
and reliability, the source himself is also unsure about the 
statement. 

The first modification of BBA shows the strong impact of 
self-confidence, which changes drastically the BBA of the 
initial assertions, from 


m(A) = 0.6,m(A) = 0.2, m(AU A) = 0.2, 
to 
m!(A) = 0.18, m’(A) = 0.06, m'(A U A) = 0.76. 
Unsurprisingly, the overall reliability is low: 


m(r) = 0.0223, m(r) = 0.4398, m(r Ur) = 0.5379, 


Uncertainty of inputs 


2*Observation | Objectivity 0.10 
ObservationalSensitivity | 0.10 


2*Reporting SelfConfidence 0.30 
Ambiguity 0.27 


TABLE IX 
AMBIGUOUS REPORT: INPUT UNCERTAINTY. 


Fusion uncertainty Complet. 
Assertion 


0.47 


Trust 0.006 


PAsserton 
0383 [0.417 
[Trust 


0.973 0.027 
TABLE X 
AMBIGUOUS REPORT: FUSION UNCERTAINTY. 


and the results of the final combination show an important 
mass assigned to m;(A U A) = 0.9946. Intuitively, the infor- 
mation provided is useless, and considered as highly uncertain. 

This scenario shows the combined effects of uncertain re- 
porting and incomplete source description for trust estimation. 
First, the outcome is affected by high values of uncertainty 
induced during observation and reporting passes, table IX. 
Then, fusion leads to a trust estimation having high values 
of Dissonance, and very low values of Conclusiveness and 
Completeness. 

The same criteria estimated for reliability show the main 
difference with respect to the previous case, which was also 
based on unreliable sources. While in scenario 4 the source 
still has important Completeness, this measure is drastically 
decreased for this scenario, as shown in table X. 


VIII. STRENGTHS AND LIMITATIONS OF BELIEF-BASED 
FORMALIZATION FOR TRUST ASSESSMENT 


This section discusses the strengths and limitations of the 
belief-based perspective in trust modeling in the light of results 
shown by previous scenarios. The main advantage of using 
belief functions is that the formalism is consistent with the 
cognitive perspective of trust adopted by the model, thanks 
to the notion of belief. It also captures uncertainties both of 
the analyst with respect to the source and of the source with 
respect to their own statements with different mechanisms. 
First, self-confidence is implemented thanks to a discounting 
coefficient, as, in practice, the values of self-confidence may 
rely upon linguistic clues of certainty/uncertainty that can be 
translated into numerical values. Second, the formalization in- 
troduces weighting factors in order to offer a flexible solution, 
which allow for situations in which the analyst has more or 
less complete knowledge about distinct attributes of the source, 
or wishes to emphasize one particular attribute. Moreover, 
the formalization is able to handle ignorance on various 
aspects, including missing data. The overall fusion mechanism 
performs trust estimation in several steps, which allows for 
a better traceability of the outcome and the mapping at 
different processing stages using URREF criteria. The results 
of these scenarios are in line with their specific hypotheses, 
reflecting the intuition that the fusion technique is appropriate 
for estimating trust. 
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As with any user-centric approach, the main limitation of 
the solution discussed in this paper is the lack of guidance for 
choosing the set of numerical values with which to instantiate 
the model. For example, two different analysts may choose 
differing mass distribution and weight coefficients with respect 
to the same source, and they may also use slightly different 
approaches to infer a numerical value from linguistic clues 
when handling self-confidence. Thus, the outcome depends 
crucially on the interventions of users and their ability to build 
a model able to capture the situation under analysis. Also, 
the solution requires preexisting knowledge about the source’s 
reputation, competence, and intention, indeed, in practice, it 
is difficult to have access to information on those aspects. 
Provided that there is no other meta-data or domain knowledge 
available for use, the model is likely to fail to produce an 
accurate trust evaluation in some contexts due to the shortage 
of knowledge on critical aspects. 

As such, the belief-based formalization has limited capabil- 
ities to explain the outcome. To overcome this limitation, a 
mapping to URREF uncertainty criteria is used. The mapping 
highlights when uncertainties are added into the system and 
which partial results and affected. It facilitates the interpreta- 
tion of results by adding additional information as to why 
the item is to be trusted or no; for example, whereas the 
fusion process outputs low values of trust for a given item, 
the mapping to URREF criteria allows to underline problems 
related to evidence collection or reporting, dissonance or 
incompleteness during the fusion stages. 

As shown in previous scenarios, using a belief-oriented 
formalism and URREF criteria mapping offers a pragmatic 
approach to develop a more comprehensive and easy to 
interpret solution for trust estimation. 


IX. CONCLUSION 


This paper presents a computational model by which an 
analyst is able to assess trust in reported information based 
on several possible unknown attributes of the source as well 
as additional characterization of the informational content by 
the source itself. The paper also illustrates the use of URREF 
criteria to track uncertainty affecting the results, from model 
construction to its formalization with belief functions. First, a 
model for trust estimation has been developed that combines 
several attributes of sources and their own assessment of 
the items reported. The model is implemented using belief 
functions, and takes advantage of its mathematical background 
to define fusion operators for trust assessment. Several scenar- 
ios are presented to illustrate uncertainty analysis, illustrating 
when uncertainty occurs and how it affects partial results for 
different applications. 

Tracking uncertainty is suitable for fusion systems in which 
various human sources send observations of questionable 
quality and there is a need to continuously update the trust 
associated with reports to be analyzed. The set of URREF 
criteria offers a unified basis to analyze inaccuracies affecting 
trust estimation during different phases: observation, reporting, 
and fusion. Select use cases clearly illustrated the benefits 


of managing uncertainties arising during the modeling and 
formalization phases, with the twofold analysis offering ad- 
ditional details on results and improving their interpretation. 
The general approach taken in this paper could be adapted to 
investigate the general mechanisms by which fusion processes 
integrate information from multiple sources. The solution is 
especially useful for comparing different fusion approaches 
with respect to their implications for uncertainty management. 
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Abstract—In this paper, we present a deciding technique for 
robotic dexterous hand configurations. This algorithm can be 
used to decide on how to configure a robotic hand so it can 
grasp objects in different scenarios. Receiving as input, several 
sensor signals that provide information on the object’s shape, the 
DSmT decision-making algorithm passes the information through 
several steps before deciding what hand configuration should 
be used for a certain object and task. The proposed decision- 
making method for real time control will decrease the feedback 
time between the command and grasped object, and can be 
successfully applied on the robot dexterous hands. For this we 
have used the Dezert-Smarandache theory which can provide 
information even on contradictory or uncertain systems. 
Keywords: neutrosophy, DSmT, decision-making algorithms, 
robotic dexterous hands, grasping configurations, grasp type. 


I. INTRODUCTION 


The purpose of autonomous robotics is to build systems 
that can fulfill all kind of tasks without human intervention, 
in different environments which were not specially build for 
robot interaction. A major challenge for this autonomous 
robotics field comes from high uncertainty within real environ- 
ments. This is because the robot designer can’t know all the 
details regarding the environment. Most of the environment 
parameters are unknown, the position of humans and objects 
can’t be previously anticipated and the motion path might be 
blocked. Beside these, the accumulated sensor information can 
be uncertain and error prone. The quality of this information 
is influenced by noise, visual field limitations, observation 
conditions and the complexity of interpretation technique. 

The artificial intelligence and the heuristic techniques were 
used by many scientists in the field of robot control [1] and 
motion planning. Regarding the grasping and object manipula- 
tions, the main research activities were to design a mechanism 
for hand [2-4] and dexterous finger motion [5], which are a 
high complexity research tasks in controlling robotic hands. 

Currently in the research area of robotics, it’s desired to 
develop robotic systems with applications in dynamic and 
unknown environments, in which human lives would be at risk, 
like natural or nuclear disaster areas, and also in different fields 
of work, ranging from house chores or agriculture to military 
applications. In any of these research areas, the robotic system 
must fulfill a series of tasks which implies object manipulation 
and transportation, or using equipment and tools. From here 
arises the necessity of development grasping systems [6] to 
reproduce as well as possible human hand motion [7-9]. 
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To achieve an accurate grasping system, grasp taxonomy of 
human hand was analyzed by Feix (et. al) [10] who found 33 
different grasp types, sorted by opposition type, virtual finger 
assignments, type in terms of power, precision or intermediate 
grasp, and the position of the thumb. While Alvarez [11] 
(et. al) researched human grasp strategies within grasp types, 
Fermuller (et. al) [12] focused on manipulation action for 
human hand on different object types including hand pre- 
configuration. Tsai (et. al) [13] found that classifying objects 
into primitive shapes can provide a way to select the best 
grasping posture, but a general approach can also be used for 
hand-object geometry fitting [14]. This classification works 
well for grasping problems in constrained work space using 
visual data combined with force sensors [15] and also for 
under-actuated grasping which uses rotational stiffness [16]. 
But for unknown objects, scientists found different approaches 
to solve the hand grasping problem. Choi (et. al) [17] used two 
different neural networks and data fusion to classify objects, 
Seredynski (et. al) [18] achieved fast grasp learning with 
probabilistic models, while Song (et. al) [19] used a tactile- 
based blind grasping along with a discrete-time controller. The 
same approach is used by Gu (et. al) [20] which proposed 
a blind haptic exploration of unknown objects for grasp 
planning of dexterous robotic hand. Using grasping methods, 
Yamakawa (et. al) [21] developed a robotic hand for knot 
manipulation, while Nacy (et. al) [22] used artificial neural 
network algorithms for slip prevention and Zaidi (et. al) [23] 
used a multi-fingered robot hand to grasp 3D deformable 
objects, applying the method on spheres and cubes. 

While other scientists developed grasping strategies for 
different robotic hands [21-23], an anthropomorphic robotic 
hand has the potential to grasp regular objects of different 
shapes and sizes [24, 25], but selecting the grasping method 
for a certain object is a difficult problem. A series of papers 
have approached this problem by developing algorithms for 
classifying the grasping by the contact points [26, 27]. These 
algorithms are focused on finding a fix number of contact areas 
without taking into consideration the hand geometry. Other 
methods developed grasping systems for a certain robotic hand 
architecture, scaling down the problem to finding a grasping 
method with the tip of the fingers [27]. These methods are 
useful in certain object manipulation, but can’t be applied for 
a wide range of objects because it doesn’t provide a stable 
grasping due to the face that it’s not used, the finger’s interior 
surface or the palm of the hand. A method for filtering the high 
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number of hand configurations is to use predefined grasping 
hand configurations. Before grasping an object, humans, un- 
consciously simplify the grasping action, choosing one of the 
few hand positions which match the object’s shape and the 
task to accomplish. In the scientific literature there are papers 
which have tried to log in the positioning for grasping and 
taxonomy, and one of the most known papers is [28]. Cutkosky 
and Weight [29] have extended Napier’s [28] classification by 
adding the required taxonomy in the production environment, 
by studying the way in which the weight and geometry of the 
object affects choosing the grasping positioning. Iberall [30] 
has analyzed different grasping taxonomies and generalized 
them by using the virtual finger concept. Stransfield [31] has 
chosen a simpler classification and built a system based on 
rules which provided a grasping positioning set, starting from 
a simplified description of the object gained from a video 
system. 

The developed algorithm presented in this paper, has the 
purpose to determine the grasping position according to the ob- 
ject’s shape. To prove the algorithm’s efficiency we have cho- 
sen 3 types of grasping: cylindrical, spherical and prismatic. 
For this, we start from the hypothesis that the environment data 
are captured through a stereovision system [32] and a Kinect 
sensor [33]. On this data, which the two system observers 
provide, we apply a template matching algorithm [34]. This 
algorithm will provide a matching percentage of the object 
that needs to be grasped with a template object. Thus, each 
of the two sources will provide 3 matching values, for each 
of the three grasping types. These values represent the input 
for our detection algorithm, based of Dezert-Smarandache 
Theory (DSmT) [35] for data fusion. This algorithm has as 
input data from two or multiple observers and in the first 
phase they are processed through a process of neutrosofication 
which is similar with the fuzzification process. Then, the 
neutrosophic observers’ data are passed through an algorithm 
which applies the classic DSm theory [35] in order to obtain 
a single data set on the system’s states, by combining the 
observers’ neutrosophic values. On this obtained data set, we 
apply the developed DSmT decision-making algorithm that 
decides on the category from which the target object is part 
of. This decision facilitates the detection-recognition-grasping 
process which a robotic hand must follow, obtaining in the 
end a real-time decision that doesn’t stop or delay the robot’s 
task. 

In recent years, using more sensors for a certain applications 
and then using data fusion is becoming more common, in 
the military and nonmilitary research fields. The data fusion 
techniques combine the information received from different 
sensors with the purpose of eliminating disturbances and to 
improve precision compared to the situations when a single 
sensor is used [36, 37]. This technique works on the same prin- 
ciple used by humans to feel the environment. For example, a 
human being can’t see over the corner or through vegetation, 
but with his hearing he can detect certain surrounding dangers. 
Beside the statistical advantage build from combining the 
details for a certain object (through redundant observations), 


using more types of sensors increases the precision with which 
an object can be observed and characterized. For example, an 
ultrasonic sensor can detect the distance to an object, but a 
video sensor can estimate its shape, and combining these two 
information sources will provide two distinct data on the same 
object. 

The evolution on the new sensors, the hardware’s processing 
techniques and capacity improvements facilitate more and 
more the real time data fusion. The latest progress were 
made in the area of computational and detection systems, and 
provide the ability to reproduce, in hardware and software, the 
data fusion capacity of humans and animals. The data fusion 
systems are used for targets tracking [38], automatic targets 
identification [39] and automated reasoning applications [40]. 
The data fusion applications are widespread, ranging from 
the military [41] applications (target recognition, autonomous 
moving vehicles, distance detection, battlefield surveillance, 
automatic danger detection) to civilian application (monitoring 
the production processes, complex tools maintenance based 
on certain conditions, robotics [42], and medical applications 
[43]). The data fusion techniques undertake classic elements 
like digital signal processing, statistical estimation, control 
theory, artificial intelligence and numeric methods [44]. 

Combined data interpretation requires automated reasoning 
techniques taken from the area of artificial intelligence. The 
purpose of developing the recognition based systems, was to 
analyze issues like the data gathering context, the relationship 
between observed entities, hierarchical grouping of targets or 
objects and to predict future actions of these targets or entities. 
This kind of reasoning is encountered in humans, but the 
automated reasoning techniques can only closely reproduce 
it. Regardless of the used technique, for a knowledge based 
system, 3 elements are required: one or more reasoning dia- 
grams, an automated evaluation process and a control diagram. 
The reasoning diagrams are techniques of facts representation, 
logical relations, procedural knowledge and uncertainty. For 
these techniques, uncertainty from the observed data and from 
the logical relations can be represented using probabilities, 
fuzzy theory [45, 46], Demspter-Shafer [47] evidence intervals 
or other methods. Dezert-Smarandache theory [35] comes 
to extend these methods, providing advanced techniques of 
uncertainty manipulation. The automated reasoning system’s 
developing purpose is to reproduce the human capability of 
reasoning and decision making, by specifying rules and frames 
that define the studied situation. Having at hand an information 
database, it’s required an evaluation process so this informa- 
tion can be used. For this there are formal diagrams developed 
on the formal logic, fuzzy logic, probabilistic reasoning, tem- 
plate based methods, case based reasoning and many others. 
Each of these reasoning diagrams has a consistent internal 
formalism which describes how to use the knowledge database 
for obtaining the final conclusion. An automated reasoning 
system needs a control diagram to fulfill the thinking process. 
The used techniques include searching methods, systems for 
maintaining the truth based on assumptions and justifications, 
hierarchical decomposition, control theory, etc. Each of these 


300 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


methods has the purpose of controlling the reasoning evolution 
process. 

The results presented in this paper, were obtained using the 
classic Dezert Smarandache Teory (DSmT) to combine inputs 
from two different observers that want to classify objects 
into three categories: Sphere, Parallelepiped and Cylinder. 
These categories were chosen to include most of the objects 
that a manipulator can grasp. The algorithm’s inputs were 
transformed into belief values of certainty, falsity, uncertainty 
and contradiction values. Using these four values and their 
combinations according to DSmT we applied Petri net diagram 
logic for taking decisions on the shape type of the analysed 
objects. This type of algorithm has never been used before for 
real time decision on hand grasping taxonomy. Comparing to 
other algorithms [13-15] and methods [16-18], 

ours has the advantage to detect high uncertainties and 
contradictions which in practice has a very low encounter rate 
but can have drastic effects on the decision type or robot, 
because if the object’s shape is not detected properly, then 
the robot might not be able to grasp it, which can lead 
to serious consequences. In deciding how to grasp objects, 
researchers have used different methods to choose the grasping 
taxonomy using a blind haptic exploration [20] or in different 
applications for tying knots [21] or grasp deformable objects 
[23]. Because the proposed algorithm can detect anomalies of 
contradicting and uncertain input values, we can say that the 
proposed method transforms the deciding process into a less 
difficult problem of grasping method [24, 25]. 


II. OBJECTS GRASPING AND ITS CLASSIFICATION 


Mechanical hands have been developed to provide the robots 
with the ability of grasping objects with different geometrical 
and physical properties [48]. To make an anthropomorphic 
hand seem natural, its movement and the grasping type must 
match the human hand. 

On this regard, grasping position taxonomy for human hands 
has been long studied and applied for robotic hands. Seventeen 
different categories of human hands grasping positions were 
studied. But first we must consider two important things. The 
first one is that these categories are derived from human hand 
studies, which proved that they are more flexible and able 
to perform a multitude of movements than any other robotic 
hand, so that the grasping taxonomy for robot hands can be 
only a simple subset of the human hand. Secondly, the human 
behavior studies of real object grasping, have shown some 
differences between the real observations and the classified 
properties [49]. 

In conclusion, any proposed taxonomy is only a reference 
point which the robot hand must attain. Below are described 
the most used grasping positions (extracted from [50]), which 
should be considered when developing an able robotic hand: 

1) Power grasping. The contact with the objects is made 

on large surfaces of the hand, including hand phalanges 
and the palm of the hand. For this kind of grasping high 
forces can be exerted on the object. 


e Spherical grasping: used to grasp spherical objects; 
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e Cylindrical grasping: used to grasp long objects 
which can’t be completely surrounded by the hand; 

e Lateral grasping: the thumb exert a force towards 
the lateral side of the index finger. 

2) Precision grasping: the contact is made only with the tip 
of the fingers. 

e Prismatic grasping (pinch): used to grasp long ob- 
jects (with small diameter) or very small. Can be 
achieved with two to five fingers. 

e Circular grasping (tripod): used in grasping circular 
or round objects. Can be achieved with three, four 
or five fingers. 

3) No grasping: 

e Hook: the hand forms a hook on the object and 
the hand force is exerted against an external force, 
usually gravity. 

e Button pressing or pointing 

e Pushing with open hand. 

In the table I, are shown manipulation activities that the 
robotic hand can achieve, correlated with the required activity 
grasping positions [51]. 


III. OBJECT DETECTION USING STEREO- VISION AND 
KINECT SENSOR 


Object recognition in artificial sight represents the task of 
searching a certain object in a picture or a video sequence. 
This problem can be approached as a learning problem. At 
first, the system is trained with sample images which belong to 
the target group, the system being taught to spot these among 
other pictures. Thus, when the system receives new images, it 
can ‘feel’ the presence of the searched object/sample/template. 

Template matching is a techniques used to sort objects in 
an image. A model is an image region, and the goal is to 
find instances of this model in a larger picture. The template 
matching techniques represent a classic approach for local- 
ization problems and object recognition in a picture. These 
methods are used in applications like object tracking, image 
compression, stereograms, image segmentation [52], and other 
specific problems of artificial vision [53]. Object recognition 
is very important for a robot that must fulfill a certain task. 
To complete its task, the robot must avoid obstacles, to obtain 
the size of the object, to manipulate it, etc. For the case of 
detected object manipulation, the robot must detect the object’s 
shape, size and position in the environment. The main methods 
for achieving the depth information use stereoscopic cameras, 
laser scanners and depth cameras. To achieve the proposed 
decision-making algorithm, we assumed that the environment 
information is captured with a stereoscopic system and a 
Kinect sensor. 

Stereovision systems [32] represents a passive technique of 
achieving a virtual 3D image of the environment in which the 
robot moves, by matching the common features of an image 
set of the same scene. Because this method works with images, 
it needs a high computational power. The depth information 
can be noisy in certain cases, because the method depends 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


TABLE I: Grasping position for certain tasks. 


Object Activity Grasping position 
Bottles, cups and mugs Transport Force: Cylindrical grasping 
Pouring/ filling (from the side or the top) 
Cups (using handles) Pouring/filling Force: Lateral grasping 
Precision: Prismatic grasping 
Plates/trays Transport Power: Lateral grasping 


Receiving from humans 


Precision: Prismatic grasping 
No grasp: pushing (open hand) 


Pens, cutlery Transport Precision: Prismatic grasping 
Door handle Open/Close Force: Cylindrical grasping 
No grasp: Hook 
Small objects Transport Power: Spherical grasping 
Precision: Circular grasping (tripod) 
Switches, buttons Pushing No grasp: Button pressing 
Round switches, bottle caps Rotation Force: Lateral grasping 


Precision: Circular grasping (tripod) 


on the texture of the environment objects and on the ambient 
light. 

Kinect [33] is a fairly easy to obtain platform, which makes 
it widespread. It uses a depth sensor based on structured light. 
By using an ASIC board, the Kinect sensor generates a depth 
map on 11 bits with a resolution of 640 x 480 pixels, at 30Hz. 
Given the price of the device, the information quality is pretty 
good, but it has both advantages and disadvantages, meaning 
that the depth images contain areas where the depth reading 
couldn’t be achieved. This problem appears from the fact that 
some materials don’t reflex infrared light. When the device is 
moved really fast, like any other camera, it records blurry 
pictures, which also lead to missing information from the 
acquired picture. 


IV. NEUTROSOPHIC LOGIC AND DSM THEORY 
A. Neutrosophic Logic 


The neutrosophic triplet (truth, falsity and uncertainty) idea 
appeared in 1764 when J.H. Lambert investigates a witness 
credibility which was affected by the testimony of another 
person. He generalized Hooper’s rule of sample combination 
(1680), which was a Non-Bayesian approach for finding a 
probabilistic model. Koopman in 1940 introduces the low and 
high probability, followed by Good and Dempster (1967) who 
gave a combination rule of two arguments. Shafer (1976) 
extended this rule to Dempster-Shafer Theory for Trust Func- 
tions by defining the Trust and Plausibility functions and 
using the inference rules of Dempster for combining two 
samples from two different sources. The trust function is 
a connection between the fuzzy reasoning and probability. 
Dempster-Shafer theory for Trust functions is a generalization 
of Bayesian Probability (Bayes 1760, Laplace 1780). It uses 
the mathematical probability in a more general way and it is 
based on the probabilistic combination of artificial intelligence 
samples. Lambert one said that “there is a chance p that the 
witness can be trustworthy and fair, a chance q that he will be 
deceiving and a chance 1 — p — q that he will be indifferent’. 
This idea was taken by Shafer in 1986 and later, used by 
Smarandache to further develop the neutrosophic logic [54, 
35]. 


1) Neutrosophic Logic Definition: A logic in which each 
proposition has its percentage of truth in a subset T, its 
percentage of uncertainty in a subset J and its percentage 
of falsity in a subset F' is called neutrosophic logic [54,55]. 
This paper extends the general structure of the Neutrosophic 
Robot Control (RNC), known as the Vladareanu—Smarandache 
method [55]-[57] for the robot hybrid force-position control in 
a virtual platform [58,59], which applies neutrosophic science 
to robotics using the neutrosophic logic and set operators. 
Thus, using two observers, a stereovision system and a Kinect 
sensor, will provide 3 matching values for DSmT decision- 
making algorithms. A subset of truth, uncertainty and falsity 
is used instead of a single number because in many cases one 
cannot know with precision the percentage of truth or falsity. 
But these can be approximated. For example, a supposition 
can be 30% to 40% true and 60% to 70% false [60]. 


2) Neutrosophic components definition: Let T, I, F' be 
three standard or non-standard subsets of ]~0,1*[ with 


sup Lf = teap inf T= ting, 
sup J. Veup inf I = ting, 
sup F = Soup inf F = fing, 
and 
Nsup = tsup mn tsup We fsup 


Ninf = tinf T linf aly, finf 


The T’, J, F' sets are not always intervals, but can be subsets: 
discrete or continuum; with a single element; finite or infinite 
(the elements are countable or uncountable); subsets union 
or intersection. Also, these subsets can overlap, and the real 
subsets represent the relative errors in finding the t, 7, f values 
(when the 7’, J, F' subsets are reduced to single points). 

T, I, F' are called the neutrosophic components and repre- 
sents the truth, uncertainty and falsity values, when referring 
to neutrosophy, neutrosophic logic, neutrosophic sets, neutro- 
sophic probability or neutrosophic statistics. This representa- 
tion is closer to the human reasoning and defines knowledge 
imprecision or linguistic inaccuracy received from different 
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observers (this is why 7’, J, F' are subsets and can be more that 
a set of points), the uncertainty given by incomplete knowledge 
or data acquisition errors (for this we have the set J) and the 
vagueness caused by missing edges or limits. 

After defining the sets, we need to specify their superior 
(sup) and inferior (x;,,¢) limits because in most of the cases 
they will be needed [61,62]. 


3) Dezert-Smarandache Theory (DSmT): To develop artifi- 
cial cognitive systems a good management of sensor informa- 
tion is required. When the input data are gathered by different 
sensors, according to the environment certain situations may 
appear when one of the sensors cannot give correct information 
or the information is contradictory between sensors. To resolve 
this issue a strong mathematical model is required, especially 
when the information is inaccurate or uncertain. 

The Dezert-Smarandache Theory (DSmT) [53,54,60] can 
be considered an extension of Dempster-Shafer theory (DST) 
[46]. DSmT allows information combining, gathered from 
different and independent sources as trust functions. DSmT 
can be used for solving information fusion on static or 
dynamic complex problems, especially when the information 
differences between the observers are very high. 

DSmT starts by defining the notion of DSm free model, 
denoted by M/(Q) and says that © is a set of exhaustive 
elements 6;, 2 = 1,..., which cannot overlap. This model is 
free because there are no other suppositions over the hypothe- 
sis. As long as the DSm free model is fulfilled, we can apply 
the associative and commutative DSm rule of combination. 

DSm theory [62] is based on defining the Dedekind lattice, 
known as the hyper power set of frame 0. In DSmT, O 
is considered a set {0),...,0n} of n exhaustive elements, 
without adding other constraints. DSmT can tackle information 
samples, gathered from different information sources which 
don’t allow the same interpretation of the set © elements. Let 
© = {61,62} be the simple case, made of two assumptions, 
then [54]: 

e the probability theory works (assuming exclusivity and 

completeness assumptions) with basic probability assign- 
ments (bpa) m(.) € [0, 1] such that 


e the Dempster-Shafer theory works, (assuming exclusivity 
and completeness assumptions) with basic belief assign- 
ments (bba) m(.) € [0, 1] such that 


m(61) + m(62) + m(64 U 02) aE 


e the DSm theory works (assuming exclusivity and com- 
pleteness assumptions) with basic belief assignment (bba) 
m(.) € [0, 1] such that 


m(61) + m(62) + m(64 U 02) + m(O, M 02) a 
The D® hyperpower set notion 


One of the base elements of DSm theory is the notion of 
hyper power set. Let 0 = {6),...,6,} be a finite set (called 


frame) with n exhaustive elements. The Dedekind lattice, 
called hyper power set D within DSmT frame, is defined as 
the set of all built statements from the elements of set O with 
the U and operators such that: 

1) 0,01,...,9n € D®: 

2) If A,B € D®, then AN BE D® and AUBeE D®; 

3) No other element is included in D® with the exception 

of those mentioned at 1 and 2. 

D® dual’s (obtained by changing within expressions the op- 
erator M with the operator U) is D®. In D® there are elements 
that are dual with themselves. The cardinality of D® increases 
with 2” when the cardinality of © is n. Generating the D® 
hyper power set is close connected with the Dedekind [54,55] 
known problem of isotone Boolean function set. Because for 
any finite set 6, |D©| > |2°|, D® is called the hyper power 
set of O. 

The 0,,7=1,...,n elements from © form the finite set 
of suppositions/concepts that characterize the fusion problem. 
D® represents the free model of DSm M/(Q) and allows 
working with fuzzy concepts that describe the intrinsic and 
relative character. This kind of concepts cannot be accurately 
distinguished within an absolute interpretation because of the 
unapproachable universal truth. 

With all of this, there are certain particular fusion prob- 
lems that imply discrete concepts, where the @;elements are 
exclusively true. In this case, all the exclusivity constraints 
of 0,, i =1,...,n must be included in the previous model, 
to properly characterize the truthiness character of the fusion 
problem and to match reality. For this, the hyper power set D® 
is decreased to the classic power set 2° forming the smallest 
hybrid DSm model, noted with M°(@), and coincide with 
Shafer’s model. 

Besides the problem types that correspond with the Shaffer’s 
model M°(@) and those that correspond with the DSm 
free model M/(Q), there is an extensive class of fusion 
problems that include in © states, continuous fuzzy concepts 
and discrete hypothesis. In this case we must take into consid- 
eration certain exclusivity constraints and some non-existential 
constraints. Each fusion hybrid problem is described by 
a DSm hybrid model M(0) with M(0) 4 .M/(@) and 
M(0) 4 M0). 


The generalized belief functions 


Starting from a general frame 0, we define a D° — [0,1] 
transformation associated with an information source B like 
[54]: 

m(0) =0,and S* m(A) =1. (1) 


AED? 


The m(A) value is called generalized basic belief assign- 
ment of A. 

The generalized trust and plausibility are defined in the same 
way as in Dempster-Shafer theory [47]: 


Bel(A)= S> m(B), (2) 


BA 
BeD®? 
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S- m(B). (3) 
BNA 

BeD® 
These definitions are compatible with the classic trust function 
definition from the Dempster-Shafer theory when D® is 
reduced to 2° for fusion problems where the Shafer model 
M°(®) can be applied. We’re still having for all A € D®, 
Bel(A) < PI(A). To notice that when we work with the 
free DSm M/(©) model, we will always have PI(A) = 1, 
VA #0 € D®, which is normal [54]. 


The DSm classic rule of combination 


When the DSm free model M/(©) can be applied, the 
combination rule myyr(6@) = m(-) = [m1 ® ma](-) of two 
independent sources 6, and Bg that provide information on 
the same frame © with the belief functions Bel,(-) and 
Belg(-) associated to gbba mj,(-) and mo(-) correspond to 
the conjunctive consensus of sources. Data combinations are 
done by using the formula [54]: 


VC e€ D°, muse) (C) =m(C) = m,(A)m(B). 


A,BED® 
ANB=C 
(4) 
Because D® is closed under M and U operators, this new com- 
bination rule guarantees that m/(-) is a generalized trust value, 
meaning that m(.) : D® — [0,1]. This rule of combination 
is commutative and associative and can be used all the time 
for sources fusion which implies fuzzy concepts. This rule 
can be extended with ease for combining k > 2 independent 
information sources [55,56]. 

Because of the high number of elements in D®, when the 
cardinality of O increases, the need of computational resources 
also increases for processing the DSm combination rule. This 
observation is true only if the core (the set of generalized 
basic belief assignment for the needed elements) 1 (™m1) and 
K2(mz) coincide with D®, meaning that when m,(A) > 0 
and ™m2(A) > 0 for any A #0) € D®. For the most practical 
applications, the K1(m1) andK2(m2) dimensions are much 
smaller than |D®| because the information sources provide 
most of the time the generalized basic belief assignment for 
only one subset of hyper power set. This facilitates the DSm 
classic rule implementation. 

Figure 1 presents the DSm combination rule architecture. 
The first layer is formed by all the generalized basic belief 


assignment values of the needed elements A;, i = 1,...,n of 
my(-). The second layer is made out of all the generalized 
basic belief assignment values B;, 7 = 1,...,k of mo(-). 


Each node from the first layer is connected with each node of 
the second layer. The output layer is created by combining the 
generalized basic belief assignment values of all the possible 
intersections A; B;, i = 1,...,n and j=1...,k. If we 
would have a third source to provide generalized basic belief 
assignment values ™m3(-), this would have been combined by 
placing it between the output layer and the second one that 
provides the generalized basic belief assignment values mo(-). 
Due to the commutative and associative properties of DSm 


classic rule of combination, in developing the DSm network, 
a particular order of layers is not required [54]. 


m(A, 0 B,) = m(A,)#m,(B,) 
m(A, 0 B,) = m,(A,)#m,(B,) 


m(A, > B,) = m,(A,)*m,(B,) 


m(A, 0 B,) = m(A,)*m,(B,) 


m(A, 0 B,) = m,(A,)#m,(B,) 
m(A, 0 B,) = m,(A,)#m,(B,) 


Fig. 1: Graphical representation of DSm classic rule of com- 
bination for M/(©) [35]. 


V. DECISION-MAKING ALGORITHM 


As observed in this paper, according to the object shape 
and assigned task, grasping is divided into 8 categories [63]: 
spherical grasping, cylindrical grasping, lateral grasp, pris- 
matic grasp, circular grasping, hook grasping, button pressing 
and pushing. From these grasping types, the most used ones 
are cylindrical and prismatic grasping (see table I). These 
can be used in almost any situation and we can say that 
spherical grasping is a particular grasping of these two. The 
spherical grasp is used for power grasping, when the contact 
with the object is achieved with all the fingers’ phalanges and 
the hand’s palm. This is why a requirement for classification 
by the shape of the object is needed. Due to the fact that 
these types are more often encountered, they were taken into 
consideration for the studied fusion problems. 

The fusion problem aims to achieve a classification, by 
shape of objects to grasp, so that these can match with 
the other three types of grasping studied. The target objects 
are Classified into three categories: sphere, parallelepiped and 
cylinder. For each category, a grasping type is assigned [56]. 

Following the presented theory in section IV, the informa- 
tion is provided by two independent sources (observers): a 
stereovision system and a Kinect sensor. The observers are 
presented in section II, and are used to scan the robot’s 
work environment. By using the information provided by the 
two observers, a 3D virtual image of the environment is 
achieved, from which the human operator choses the object 
to be grasped, thus defining the grasping task that must be 
achieved by the robot. The 3D image of the object, isolated 
from the scene, is compared with three templates, formed by 
similar methods, which represents a sphere, a parallelepiped 
and a cylinder. Afterwards, a template matching algorithm is 
applied to place the object in one of the three categories, 
with a certain matching percentage. This percentage can vary 
according to the conditions in which the images are obtained 
(weak light, object from which the light is reflected, etc.). 
The data taken from each sensor are then individual processed 
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with a neutrosophication algorithm, with the purpose of ob- 
taining the generalized basic belief assignment values for each 
hypothesis that can characterize the system. In the next step, 
having the basic belief assignment values, we combine the data 
provided by the two observers by using the classic DSm rule of 
combination. The next step is to apply a de-neutrosophication 
algorithm on the obtained values, to achieve the decision on 
the shape of the object by placing it into the three categories 
mentioned above. The entire process is graphical represented 
in figure 2. 


STEREOVISION KINECT 
SYSTEM SENSOR 


3D image of the 
environment 
3D image of the 
reference object 
3D image of the 
environment 
3D image of the 
reference object 


TEMPLATE TEMPLATE 
MATCHING MATCHING 


DATA 
NEUTROSOPHICATION 


INFORMATION FUSION 
USING DSMT 


Belief. 
masses 


DATA 
DE_NEUTROSOPHICATION 
DECISION 


Fig. 2: Diagram of the proposed algorithm. 


A. Data neutrosophication 


Each observer provides a truth percentage for each system’s 
state. The state set O = 01, 02, 63} that characterizes the fusion 
problem is: 


© = {Sp, Pa, Cy}, (5) 


where Sip = Sphere, Pa = Parallelepiped and C'y = Cylinder. 

To compute the belief values for the hyper power set D® 
elements we developed an algorithm based on the neutrosophic 
logic. The hyper power set D® is formed by using the method 
presented in the paragraph devoted to the D° hyperpower set 
notion, and has the form: 


D® = {, Sp, Pa, Cy, SpU Pa, SpU Cy, Cy U Pa, 
Spm Pa, Spn Cy, Cyn Pa, Spn (Cy U Pa), 
CyN (SpU Pa), Pan(CyU Sp), 


SpUCyU Pa, SpnCyn Pa}. (6) 


The statements of each observer are handled in ways of 
truth (7), uncertainty (J) and falsity (Ff), specific to the 
neutrosophic logic. Due to the fact that F = 1 — T — J, the 
statements of falsity are not taken into consideration. 

The neutrosophic algorithm has as input the certainty proba- 
bilities (truth) provided by the observers on the system’s states. 
These probabilities are then processed using the described 
rules in figure 3. If the difference between the certainties prob- 
abilities used at a certain point by the processing algorithm 
is larger than a certain threshold found by trial and error, 
then we’ll consider that the uncertainty percentage between 
the compared states is null, and the probability that one of 
the states is true increases. In the case where this difference 
is not a set threshold, we compute the uncertainty probability 
by using the formula 


m(A) — m(B 

(A) = m(B) a 
const 

where A,B € O, and “const” depends of the chosen 

threshold. While the point determined by the two probabilities 


approaches the main diagonal, the uncertainty approaches the 
maximum probability value. 


m(AU B) =1- 


Fig. 3: Data neutrosophication rule for the observer’s data. 


From the hyper power set D®, we can determine the belief 
masses only for the values Obs;(D®) (information obtained 
after observer’s data interpretation) presented below, because 
the intersection operation M represents contradiction in DSm 
theory and we cannot compute the contradiction values for a 
single observer using: 


Obs;(D°) = {Sp, Pa, Cy, SpU Pa, SpU Cy, 


CyU Pa, SpUCyU Pa}. (8) 


The neutrosophic probabilities are detailed in Table I. 


B. Information fusion 


Having known the trust values of the hyper power set 
elements Obs;(D®), presented in table II, we apply the fusion 
algorithm, using the classic DSm combination rule described 
previously. 
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TABLE II: Grasping position for certain tasks. 


Mathematical representation Description 
Sp Certainty that the target object is a ‘sphere’ 
Pa Certainty that the target object is a ‘parallelepiped’ 
Cy Certainty that the target object is a ‘cylinder’ 
SpU Pa Uncertainty that the target object is a ‘sphere’ or ‘parallelepiped’ 
SpUCy Uncertainty that the target object is a ‘sphere’ or ‘cylinder’ 
CyU Pa Uncertainty that the target object is a ‘cylinder’ or ‘parallelepiped’ 
SpUCyU Pa Uncertainty that the target object is a ‘sphere’, ‘cylinder’ or ‘parallelepiped’ 


TABLE III: Contradictions that may appear between the neutrosophic probabilities. 


Mathematical representation 


Description 


Sp Pa Contradiction between the certainties that the target object is a ‘sphere’ and ‘parallelepiped’ 
SpnCy Contradiction between the certainties that the target object is a ‘sphere’ and ‘cylinder’ 
Cy Pa Contradiction between the certainties that the target object is a ‘cylinder’ and ‘parallelepiped’ 


Spn (Cy U Pa) 


Contradiction between the certainty that the target object is a ‘sphere’ 
and the uncertainty that the target object is a ‘cylinder’ or ‘parallelepiped’ 


Pan (SpuCy) 


Contradiction between the certainty that the target object is a ‘parallelepiped’ 
and the uncertainty that the target object is a ‘sphere’ or ‘cylinder’ 


Cy (Pau Sp) 


Contradiction between the certainty that the target object is a ‘cylinder’ 
and the uncertainty that the target object is a ‘parallelepiped’ or ‘sphere’ 


SpnCyn Pa 


Contradiction between the certainties that the target object is a’ sphere’ and ‘cylinder’, and ‘parallelepiped’ 


Applying equation (4), we get the following formulas for m(SpU Cy) = m,(SpU Cy)m2(SpU Cy) 


the combination values: +mi(SpU Cy)ma(SpUCyU Pa) (13) 
m(Sp) = mi(Sp)m2(Sp) + mi(Sp)ma(Sp U Pa) +mj,(SpU Cy U Pa)me2(SpU Cy) 

+m1(SpU Pa)ma(Sp) + m1 (Sp)mo(SpU Cy) 
+ mi(SpU Cy)ma(Sp) m(Cy U Pa) = m1 (Cy U Pa)mo(Cy U Pa) 
+mi(Sp)m2(SpU Cy U Pa) (9) +m1(CyU Pa)m2(SpUCyU Pa)) (14) 
+m,(SpU Cy U Pa)m2(Sp) +mj(SpU Cy U Pa)m2(Cy U Pa) 
porn’ Paras Gy) m(SpU Cy U Pa) = m(SpU Cy U Pa)m2(Sp U Cy U Pa) 
+m (SpU Cy)m2(SpU Pa), (15) 


m(Pa) = m1(Pa)m2(Pa) + m,(Pa)m2(Sp U Pa) 


During the fusion process, between the information pro- 
vided by the two observers contradiction situations may ap- 


+ SpUP Tee P CyUP. 
MASP a) Fa) mayne Cy @) pear. These are included in the hyper power set D® and are 
+ ma(Cy U Pa)ma(Sp) described in table III. 
+mj4(Pa)m2(Sp U Cy U Pa) (10) . sad ‘ : 
Fusion values for contradiction are determined as following: 
+m,(SpU Cy U Pa)me2(Pa) 
+ ma(SpU Pa)ma(Cy U Pa) m(Spn Pa) = m,(Sp)m2(Pa) + m,(Pa)m2(Sp) (16) 
+mj(Cy U Pa)ma(S'p U Pa), m(Spn Cy) =m, (S'p)me2(Cy) + mi(Cy)ma(Sp) (17) 
m(Cy) = m1(Cy)m2(Cy)) + mi(Cy)m2(Cy U Pa)) m(CyN Pa) = mi (Cy)ma(Pa) + mi(Pa)m2(Cy) (18) 
+ mj1(CyU Pa)m2(Cy)) + mi (Cy)ma(SpU Cy)) 
Acct SUC ura eN) m(Spn (Cy U Pa)) = m1(Sp)m2(Cy U Pa) 
+m,(Cy)m2(SpU Cy U Pa)) (11) +mi(CyU Pa)ma(Sp) (19) 
+ SpUCyUP C 
ma apts Cu eeyiaCy) m(Pan (SpU Cy)) = m1(Pa)ma(S'p U Cy) 
+m,(SpU Cy)me2(Cy U Pa)) ae ‘é 5 
+m,(Cy U Pa)ma(SpvU Cy)) ase Cy aa) 20) 
m(Sp U Pa) = m(SpU Pa)m2(SpU Pa) m(Cy NO (SpU Pa)) = mi (Cy)ma(SpU Pa) 
+m4(SpU Pa)m2(SpUCyU Pa) (12) +mi(SpU Pa)me2(Cy) (21) 


| 


+ mi(SpU Cy U Pa)ma(Sp U Pa) 
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C. Data de-neutrosophication and decision-making 


The combination values found in the previous section are 
de-neutrosophicated using the logic diagram presented in 
figure 4. For the decision-making algorithm we opted to use 
Petri nets [64], for it’s easier to notice the system’s states 
transitions. 


max(B)2threshold_2 


Ee 


max(B)<threshold_2 


Sphere Parallelepiped Cylinder 


Fig. 4: Petri diagram for decision-making algorithm. 


The decision-making diagram proved to have a certain 
difficulty level, which required adding three sub diagrams: 
1) sub_p1 (figure 5) — this sub diagram deals with the 
contradiction between: 

e the certainty probability that the target object is a 
‘sphere’ and the uncertainty probability that the tar- 
get object is either a ‘parallelepiped’ or a ‘cylinder’; 

e the certainty probability that the target object is a ‘cu 
parallelepiped be’ and the uncertainty probability 
that the target object is either a ‘sphere’ or a 
‘cylinder’ ; 

e the certainty probability that the target object is 
a ‘cylinder’ and the uncertainty probability that 
the target object is either a ‘parallelepiped’ or a 
‘sphere’. 

2) sub_p2 (figure 6) — this sub diagram deals with the 
contradiction between: 

e The certainty probability that the target object is a 
‘sphere’ and a ‘parallelepiped’; 

e The certainty probability that the target object is a 
‘sphere’ and a ‘cylinder’; 
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max(A)=| m(Spr(PaUCy)) max(A)=|m(Pan(SpUCy)) —max(A)=|m(Pan(SpuPa)) 


Fig. 5: Petri net for sub_p1. 


e The certainty probability that the target object is a 
‘cylinder’ and a ‘parallelepiped’. 


Fig. 6: Petri net for sub_p2. 


3) sub_p3 (figure 7) — this sub diagram deals with the 
uncertainty that the target object is: 
e a ‘sphere’ or a ‘parallelepiped’; 
e a ‘sphere’ or a ‘cylinder’; 
e a ‘cylinder’ or a ‘parallelepiped’. 


m(SpUCy) 


Fig. 7: Petri net for sub_p3. 
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To not overload figures 4-7 we have used the following 
notations: 


A = {m(Spn (CyU Pa), m(Pa(SpCy)),m(Cyn (Pau $p))}, 
B={m(Spn Pa),m(Spn Cy),m(Cyn Pa)}, 

C = {m(SpU Pa), m(SpU Cy), m(Cy U Pa)}, 

D = {m(Sp),m(Pa),m(Cy)}, 

a=m(SpUCy), 

b=m(PaU Sp), 

c=m/(CyU Pa). 


With the help of the Petri diagram (figure 4), we take 
the decision of sorting the target object in one of the three 
categories, as follows: 


1) Determine 


m™* & max(m(Spn (CyPa)),m(Pan (SpCy)), 
m(CyN (PaSp))). 


e If m™* = m(Spn (Cy U Pa)), the contradiction 
between the certainty value that the target object 
is ‘sphere’ and the uncertainty value that the target 
object is ‘cylinder’ or ‘parallelepiped’ is compared 
with a threshold determined through an experimen- 
tal trial-error process. If this is higher or equal with 
the chosen threshold, the target object is a ‘sphere’. 

e If m™* = m(Pan (SpUCy)), the contradiction 
between the certainty value that the target object is 
‘parallelepiped’ and the uncertainty value that the 
target object is ‘sphere’ or ‘cylinder’ is compared 
with the threshold mentioned above. If this is higher 
or equal with the chosen threshold, the target object 
is a ‘parallelepiped’. 

e If m™* = m(CyN (Pau Sp)), the contradiction 
between the certainty value that the target object is 
‘cylinder’ and the uncertainty value that the target 
object is ‘parallelepiped’ or ‘sphere’ is compared 
with the threshold mentioned above. If this is higher 
or equal with the chosen threshold, the target object 
is a ‘cylinder’. 


If none of the three conditions are met, we proceed to 
the next step: 


2) Determine 
mm £ max(m(Spn Pa), m(SpnCy),m(CyN Pa)). 


e If m™* = m(SpN Pa), the contradiction between 
the certainty values that the target object is ‘sphere’ 
and ‘parallelepiped’ is compared with a threshold 
determined through an experimental trial-error pro- 
cess. If this is higher or equal with the chosen 
threshold, we check if m(Sp) + m(Sp U Cy) > 
m(Pa)+m(CyU Pa). If this condition if fulfilled, 
then the target objects is ‘sphere’. Otherwise, the 
target object is ‘parallelepiped’. 
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e If m™* = m(Spn Cy), the contradiction between 
the certainty values that the target object is ‘sphere’ 
and ‘cylinder’ is compared with the threshold men- 
tioned above. If this is higher or equal with the 
chosen threshold, we check if (Sip)-+m(S'pU Pa) > 
m(Cy)+m(CyU Pa). If this condition if fulfilled, 
then the target objects is ‘sphere’. Otherwise, the 
target object is ‘cylinder’. 

e If m™* = m(CyN Pa), the contradiction be- 
tween the certainty values that the target object 
is ‘cylinder’ and ‘parallelepiped’ is compared with 
the threshold mentioned above. If this is higher 
or equal with the chosen threshold, we check if 
m(Cy) + m(SpU Cy) > m(Pa) + m(Sp U Pa). 
If this condition if fulfilled, then the target objects 
is ‘cylinder’. Otherwise, the target object is ‘paral- 
lelepiped’. 


If in none of the situations, the contradiction is not larger 
that the chosen threshold, we go to the next step: 


3) Determine 


mmx £ max(m(SpU Pa), m(SpUCy), m(CyU Pa)). 


e If m™** = m(SpU Pa), the uncertainty probability 
that the target object is ‘sphere’ or ‘parallelepiped’ 
is larger than a threshold determined through an ex- 
perimental trial-error process, we check if m(S'p) > 
m(Pa). If the condition is fulfilled, the target object 
is ‘sphere’. Otherwise, the target object is ‘paral- 
lelepiped’. 

e If m™** = m(SpUCy), the uncertainty probability 
that the target object is ‘sphere’ or ‘cylinder’ is 
larger than the threshold mentioned above, we check 
if m(Sp) > m(Cy). If the condition is fulfilled, the 
target object is ‘sphere’. Otherwise, the target object 
is ‘cylinder’. 

e If m™** = m(CyU Pa), the uncertainty probability 
that the target object is ‘cylinder’ or ‘parallelepiped’ 
is larger than the threshold mentioned above, we 
check if m(Cy) > m(Pa). If the condition is 
fulfilled, the target object is ‘cylinder’. Otherwise, 
the target object is ‘parallelepiped’. 


If none of the hypotheses mentioned above are not 
fulfilled, we go to the next step: 


4) Determine 


m™* & max(m(Sp),m(Pa),m(Cy)). 


e If m™** = m(Sp), the target object is a ‘sphere’. 

e If m™** = m(Pa), the target object is a ‘paral- 
lelepiped’. 

e If m™* = m(Cy), the target object is a ‘cylinder’. 
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VI. DISCUSSION 


As mentioned in the introduction chapter, the main goal 
of this paper is to find a way to grasp objects according to 
their shape. This is done by classifying the target objects 
into three main classes: sphere, parallelepiped and cylinder. 
To determine the shape of the target objects, the robot work 
environment was scanned with a stereovision system and a 
Kinect sensor, with the purpose of creating a 3D image of 
the surrounding space in which the robot must fulfill its task. 
From the two created images, the target object is selected 
and then it is compared with 3 templates, which represents 
a sphere, a cube and a cylinder. With a template matching 
algorithm the matching percentage is determined for each of 
the templates. These percentages (figure 8), represents the 
data gathered from the observers, for the fusion problem. 
Because we wanted to test and verify the decision-making 
algorithm for as many cases as possible, the observers’ values 
were simulated using sine signals with different frequency 
and amplitude of | (figure 8). This amplitude represents the 
maximum probability percentage that a certain type of object 
is found by the template matching algorithm. 


Fig. 8: Simulation of the information provided by the two sen- 
sors/observers: (a) first observer detection; (b) second observer 
detection. 


On these input data we then apply a neutrosophication 
algorithm with the purpose of obtaining the generalized belief 
assignment values for each of the statements an observer is 
doing: 
e The certainty probability that the object is a 
(figure 9.a, h) 

e The certainty probability that the object is a ‘paral- 
lelepiped’ (figure 9.b, i) 

e The certainty probability that the object is a ‘cylinder’ 
(figure 9.c, j) 

e The uncertainty probability that the object is a ‘sphere’ 
or a ‘parallelepiped’ (figure 9.d, k) 

e The uncertainty probability that the object is a ‘sphere’ 
or a ‘cylinder’ (figure 9.e, 1) 

e The uncertainty probability that the object is a ‘cylinder’ 
or a ‘parallelepiped’ (figure 9.f, m) 

e The uncertainty probability that the object is a ‘sphere’ 
or a ‘cylinder’ or a ‘parallelepiped’ (figure 9g, n) 

After the belief values were computed for each statements 
of the observers, we go to the data fusion step (figure 10). 


‘sphere’ 
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Fig. 9: Generalized trust values. From a to g correspond to 
observer | and from h to n for observer 2 as follows: (a) 
m,(S'p); (b) m (Pa); (c) m (Cy); (d) m,(Sp U Pa); (e) 
mi(Sp U Cy); (f) mi(Pa U Cy); (g) mi(Sp U Pa U Cy); 
(h) m2(Sp); (i) m2(Pa); Gj) ma(Cy); (k) m2(SpU Pa); () 
m2(Sp U Cy); (m) m2(PaU Cy); (n) m2(SpU Pau Cy). 


With the help of belief values presented in figure 10 and 
computed using the neutrosophication algorithm described in 
section V-A, we find the fusion values, presented in figure 11. 
As one can see in figure 11, the fusion values for certainty, un- 
certainty and contradiction are minimum. The only exception 
is the fusion value for the uncertainty that the target object is 
‘sphere’ or ‘parallelepiped’ or ‘cylinder’, m(S'p U Cy U Pa), 
when the data received from the observers are identical and 
not contradicting, the uncertainty is maximum. This means: 


Obs1: 33.33% sphere, 33.33% parallelepiped, 33.33% cylinder, 


and 


Obs2: 33.33% sphere, 33.33% parallelepiped, 33.33% cylinder. 


Therefore, the system cannot decide on a single state. This 
is why the robotic hand will maintain its starting position 
until the system will decide the target object’s category. This 
indecision period of time takes about 0.07 seconds. When the 
sensors values about the target object are changed from the 
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Fig. 10: Data fusion: (a) Observer 1 vs. Observer 2 for Sphere 
objects; (b) Observer 1 vs. Observer 2 for Parallelepiped 
objects; (c) Observer | vs. Observer 2 for Cylinder objects. 


equal values presented above, the algorithm is able to provide 
a solution. 

The indecision also reaches high values at the time 3.14s, 
6.28s and 9.42s of the simulation, in the conditions that 
the observer’s statements are close in value with the already 
presented case from above, 


Obs1: 33.35% sphere, 33.46% parallelepiped, 33.19% cylinder 

Obs2: 33.21% sphere, 33.10% parallelepiped, 33.69% cylinder 
for the moment 3.14s, 

Obs1: 33.44% sphere, 33.23% parallelepiped, 33.33% cylinder 

Obs2: 33.44% sphere, 33.23% parallelepiped, 33.33% cylinder 
for the moment 6.28s and 

Obs1: 33.39% sphere, 33.70% parallelepiped, 32.91% cylinder 

Obsz2: 32.96% sphere, 32.64% parallelepiped, 34.40% cylinder 


for the moment 9.42s. 

In tables IV, V, and VI we present the percentage of the 
states’ occurrence, the general belief assignment values, the 
fusion values, and the decision made by the algorithm for the 
situations previously mentioned. 


TABLE IV: Percentages of states’ occurrence for each source 
at different time steps. 


Obs, Obs2 Obs, Obs2 Obs, Obs2 

state \ time 3.14s 3.14s 6.28s 6.28s 9.42s 9.42s 
Sp 33.35% | 33.21% 33.44% | 33.44% 33.39% | 32.96% 
Pa 33.46% | 33.10% 33.23% | 33.23% 33.70% | 32.64% 
Cy 33.19% | 33.69% 33.33% | 33.33% 32.91% | 34.40% 


—— mp) 
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—mecy) 


Fusion Mass 


—— mi(Sp U Pa) 
— mp ucy)] | 
——— mPa U Cy) 
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—— m(SpuPaucy)] | 
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——— m(Sp | Pa)| 4 
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—mpaicy)| 4 
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—— miSp U (Pa I Cy) 
—— miPa U (Sp Icy) 
—— mCy U (Sp! Pa)) 


Fusion Mass 


Fig. 11: Fusion values: (a) Fusion values of m(Sp), m(Pa) 
and m(Cy); (b) Fusion values of m(SpU Pa), m(SpU Cy), 
m(Pa U Cy); (c) Fusion value of m(Sp U Pa U Cy); (d) 
Fusion values of m(SipM Pa), m(Spn Cy), m(Pan Cy); (e) 
Fusion values of m(SpU (Pan Cy)), m(PaU (Spn Cy)), 
m(Cy U (Spn Pa)). 


In all three cases, the uncertainty is quite large, and the 
algorithm ask for restarting the decision process and keeps 
the decision taken in previous decision process. In our case 
the decision was that the object is a ‘cylinder’, ‘sphere’ and 
‘cylinder’ for the three analyzed points. 

Analyzing figure 11(a), at the time of 4.08 seconds the 
object is decided to be a ‘cylinder’ because the probability 
that the target object is a cylinder is very high, but m(C'y) = 
0.7777. 

For the time interval of 4.3-4.9 seconds, where in figure 
11(d) the contradiction between the target object being a 
‘sphere’ or a ‘parallelepiped’ is larger than that the target 
object is a ‘sphere’ or a ‘cylinder’ respectively a ‘cylinder’ or 
a ‘parallelepiped’, the object is decided to be a ‘parallelepiped’ 
at first because the probability for it being a ‘parallelepiped 
is larger than the probability of it being a ‘sphere’ or a 
‘cylinder’. This situation is changed starting with second 5 
of the simulation, when the probability that the target object 
is a ‘sphere’ increase, the probability that the same object is 
a ‘cylinder’ remains low and the probability that the target 
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TABLE V: Generalized belief assignment values. 


Observations Obs, Obs2 Obs1 Obs2 Obs, Obs2 

at time 3.14s 3.14s 6.28s 6.28s 9.42s 9.42s 

Mobs, (Sp) 0.0001 | 0.0001 0.0001 | 0.0001 0.0005 | 0.0007 

MObs; (Pa) 0.0001 0 0 0 0.0011 | 0.0067 
Mobs, (Cy) 0 0.0008 0 0 0 0 

Mobs, (Sp U Pa) 0.0106 | 0.0234 0.0085 | 0.0085 0.0317 | 0.0692 

Mobs, (SpU Cy) 0.0106 | 0.0231 0.0085 | 0.0085 0.0315 | 0.0669 

Mobs, (Cy U Pa) 0.0106 | 0.0231 0.0085 | 0.0085 0.0312 | 0.0662 

Mobds,(SpUCyU Pa) | 0.9680 | 0.9295 0.9744 | 0.9744 0.9040 | 0.7903 


TABLE VI: Fusion values and decision at different time steps. 


Fusion MObs1 @ MOdbse MObs, B@ MOdbs2 MObs1 B@ MOdbse 
at time 3.14s at time 6.28s at time 9.42s 
m(Sp) 0.0006 0.0001 0.0054 
m/(Pa) 0.0006 0.0003 0.0053 
m(Cy) 0.0012 0.0002 0.0106 
m(S'p U Pa) 0.0328 0.0166 0.0898 
m(SpU Cy) 0.0325 0.0166 0.0875 
m(Cy U Pa) 0.0324 0.0166 0.0866 
m(SpU CyU Pa) 0.8999 0.9495 0.7145 
m(Spn Pa) 0 0 0 
m(Spn Cy) 0 0 0 
m(CyN Pa) 0 0 0 
m(Spn (Cy U Pa)) 0 0 0.0001 
m(PaNn (SpuU Cy)) 0 0 0.0001 
m(Cy NM (SpU Pa)) 0 0 0.0002 
Decision Cylinder Sphere Cylinder 


object is a ‘parallelepiped’ decrease below the value of the 
sphere probability. 

Using the fusion values and the decision-making diagram 
(figure 4), from section V-B, we can sort the desired object 
into the three categories: sphere, parallelepiped and cylinder. 
The obtained results are presented in figure 12. 


Decision 
T 


Object category 


Fig. 12: Object category decision, obtained from the proposed 
algorithm. Value 1 represents decision for Sphere, value 2 
represents decision for Parallelepiped and value 3 represents 
decision for Cylinder. 


VII. CONCLUSIONS 


Any robot, no mater of its purpose, has a task to fulfill. 
That task can be either of grasping and manipulation or just 
a transport task. To successfully complete its task, the robot 
must be equipped with a number of sensors that will provide 
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enough information about the work environment in which the 
work is being done. 

In this paper we studied the situation in which the robot 
is equipped with a stereovision system and a Kinect sensor, 
to detect the environment. The robot’s job was to grab and 
manipulate certain objects. With the help of two different 
systems, two 3D images of the environment can be created, 
each one for the two sensor type. In these images, we isolated 
the target object and it’s compared with three template images, 
obtained through similar methods as the environment images. 
The three template images represent the 3D virtual model of 
a sphere, a parallelepiped and a cylinder. The comparison is 
achieved with a template matching method, and following that 
we obtain a matching percentage for each template tested 
against the desired image. Because we wanted to develop 
the decision-making algorithm based on information received 
from certain template matching methods, we considered as 
known the information that these algorithms can provide. 
Moreover, to test different cases, we selected as input for 
our decision-making algorithm and output for the template 
matching methods, several sine signals that can provide all 
the different cases that can occur in practice. 

The goal of this paper is in part a data fusion problem with 
the purpose of classifying the objects in visual range of a 
humanoid robot, so it can fulfill his grasping and manipulation 
task. We also wanted to label the target object in one of three 
categories mentioned above, so that during the approach phase 
on the target object, the robotic hand can prepare for grasping 
the object, lowering the time needed to complete the task. 


The stereovision system and the Kinect sensor presented in 
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section III, represent the information sources, called in this 
paper, the observers, name taken from the neutrosophic logic. 
These observers specify the state in which the system is. One 
observer can specify 7 states for the searched object. 

With the help of neutrosophic logic, we determine the gen- 
eralized belief values for each of the 7 states. The neutrosophic 
algorithm is applied to information gathered from both of the 
sensors. We have chosen the neutrosophic logic, because it 
extends fuzzy logic, providing instruments for approaching 
also the uncertainty situations besides the truth and falsity 
ones. 

Using these belief values, we compute the fusion values on 
which we apply the classic DSm combination rule, and build 
the decision-making algorithm presented in section V. To help 
develop this decision-making algorithm we used a Petri net 
which provided us a clear method of switching through system 
states under certain conditions. 

The decision-making algorithm analyzes the probability of 
completing all the possible tasks that may appear in sensor 
data fusion and tackles these possibilities so that for every 
input the system will have an output. 

The presented method can be used successfully in real time 
applications, because it provides a decision in all the cases in 
a very short time (table VII). The algorithm can be extended 
so that it can use information received from multiple sources 
or provide a decision starting from a high number of system 
states. The number of observer/data sources is not limited 
nor is the system’s states. But while increasing the number 
of observers and system’s states, the data to be processed 
is increased and the decision-making algorithm design is 
becoming a highly difficult task to achieve. 


TABLE VII: Average execution time of the presented algo- 
rithm. 


Method Execution time (s) 
Data neutrosophication for Obs. 1 0.0026 
Data neutrosophication for Obs. 2 0.0026 
Data fusion using DSmT 0.0002 
Data de-neutrosophication/decision-making 0.0092 
Total time 0.0146 


In the case of autonomous robots, these must be taught what 
to do and how to complete their tasks. From this the necessity 
of developing new intelligent and reasoning system arises. The 
developed algorithm in this paper can be used successfully 
for target identification applications, object sorting, image 
labeling, motion tracking, obstacle avoidance, edge detection, 
etc. 
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Abstract—The identification of the subsoil constitutive ma- 
terials, as well as the detection of possible interfaces and 
anomalies, are crucial for many site characterization applications. 
During investigation campaigns, complementary geophysical and 
geotechnical methods are usually used. These two sets of methods 
yield data with very different spatial scales and different levels 
of incompleteness, uncertainty and inaccuracy. In this work, 
a mathematical combination of geophysical and geotechnical 
information is proposed in order to produce a better subsoil 
characterization. It is shown that belief functions can be used 
for such a fusion process. A specific methodology is developed 
in order to manage conflictual information and different levels 
of uncertainties and inaccuracies from different investigation 
methods. In order to test and validate this methodology, we 
focus on the use of two selected methods, Electrical Resistivity 
Tomography (ERT) and Cone Penetration Test. First, a synthetic 
model with artificial data is considered, taking advantage of 
the results obtained to conduct a comparative study (effect of 
parameters and noise level). Then, an experimental test bench 
is considered, in which a two-layered model is placed (plaster 
and saturated sands) and geophysical and geotechnical data 
are generated, using a mini-ERT device and insertion depth 
values. This work also aims at providing a better graphical 
representation of a subsoil section with associated degrees of 
belief. The results highlight the ability of this fusion methodology 
to correctly characterize the considered materials as well as 
to specify the positions of the interfaces (both vertical and 
horizontal) and the associated levels of confidence. 


Keywords: data fusion, belief functions, geophysical data, 
geotechnical data, experimental test bench, electrical resistivity 
tomography. 


I. INTRODUCTION 


For subsoils characterization, investigation campaigns are 
set up, usually consisting of geophysical and geotechnical 
methods. These two families of methods are complementary 
and are used for various issues such as the characterization 
of slope stability [1-4] the characterization of potentially 
dangerous sites [5], the characterization of sites at construction 
[6] or the characterization of river embankments [7]. 

On the one hand, geophysical methods are non-intrusive 
and provide physical information on large volumes of soils but 
with significant potential uncertainties. These uncertainties are 
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due in particular to the integrative and indirect aspects of the 
methods as well as to the resolution of the inverse problems. 
On the other hand, the geotechnical investigation methods are 
intrusive and provide more punctual information but also more 
accurate. An important issue for the assessment of subsoils is 
to be able to combine acquired geophysical and geotechnical 
data, while taking into account their respective uncertainties, 
inaccuracies and spatial distributions [8]. The complementarity 
of these two sets of methods is often underused since the 
uncertainty and inaccuracy associated with each method are 
rarely considered. Furthermore, the results are usually only 
graphically superimposed [9] instead of being mathematically 
merged. 

To characterize a section of subsoil and its potentially risky 
areas, it is essential to distinguish the different materials 
in place. The horizontal and vertical interfaces, as well as 
possible anomalies, have to be located. For levee embankment, 
as an example, it is in these locations that internal erosion is 
likely to develop, which may lead to the complete rupture of 
the levee [10]. Such a section characterization, with associ- 
ated confidence indexes, could be included in failure hazard 
models. 

The use of belief functions [11-12] and different information 
combination rules to combine geotechnical and geophysical 
data is proposed. This makes it possible to take into ac- 
count at the same time the uncertainties, inaccuracies and 
incompleteness of data related to each method. In the field 
of geosciences, belief functions have already been used and 
provide interesting results for slope instability mapping [13- 
14], detection of precious metal [15], groundwater [16] or 
flood susceptibility mapping [17]. To our best knowledge, 
no work has been proposed, considering the combination of 
two sources of information with different spatial distribution 
(spatialized and punctual) and for an investigation campaign 
in the vertical section. 

Here, an innovative method of information fusion to com- 
bine electrical resistivity tomography results and cone pen- 
etrometer test data is proposed. First, work on data obtained 
from synthetic models is displayed. The obtained results allow 
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to conduct a comparative study, evaluating the effect of differ- 
ent parameters (like the data noise level) on the fusion result. 
The fusion methodology is then tested from data acquired on 
a test bench. In this work, the potential of such a methodology 
is shown by using insertion depth data, acquired by a labora- 
tory penetration cone, and electrical resistivity data acquired 
by a mini Electrical Resistivity Tomography (ERT) device. 
The depth of penetration data corresponds to geotechnical 
information while the electrical resistivity data correspond to 
geophysical information. The main concern is to highlight the 
ability of this information fusion algorithm to characterize the 
interfaces between materials and to discriminate three different 
types of materials with variation in thickness of one of them, 
and to present the variation of the results according to the 
number and position of the simulated boreholes. 

The main contributions of this work are as follows. First, 
this new methodology makes it possible to take into account 
the uncertainties, inaccuracies and incompleteness associated 
with the different methods of investigation used, proposing a 
modeling of the Basic Belief Assignments (BBAs) specifically 
adapted to the problematic. Then, the proposed graphic rep- 
resentation is innovative since it allows both to present the 
different geological sets that would be present in the subsoil 
and their layout, while presenting the confidence associated 
with these results. This methodology is particularly suitable 
for the characterization of interfaces and anomalous zones, 
which may correspond to areas where the risk of instability is 
potentially the greatest. This work also allows the implementa- 
tion of a small physical model to validate the fusion approach 
with real data. 

This article is organized as follows. In section I a presen- 
tation of the approach of fusion used in the methodology is 
given, which introduces the use of the evidence theory and 
the combination methods used here. In section HI, a synthetic 
study will then present the fusion approach from artificial 
data. It will also present the comparative results associated 
to two parametric studies. Then, in section IV, a presentation 
of the investigation methods used in the introduced experiment 
(laboratory penetration cone and mini ERT device) is given. 
Finally, the test bench fusion results are presented in section 
V and discussed in section VI, in order to understand the 
interests, limitations and perspectives of such a methodology. 


II. FUSION METHODOLOGY 
A. Belief functions and combination rules 


The belief functions have been introduced by Shafer [11] 
in 1976 in the development of his mathematical theory of ev- 
idence inspired by previous works of Dempster [12]. Shafer’s 
theory is also referred as Dempster-Shafer theory (DST) in 
the literature. This theory (proposes a method to) calculate(s) 
the belief and the plausibility of an event (here a soil material 
class) from distinct source of evidence (measured data). The 
practical advantage of using such a theory lies in its ability 
to manage information from different sources, associated with 
variable uncertainties and inaccuracies. In this work, only two 
sources of information will be considered: geotechnical and 


geophysical. Another advantage of this theory is its ability to 
assess the degree of conflict between sources (ex: contradictory 
information between data obtained from large scale geophysi- 
cal campaign and from punctual geotechnical investigation). 
Uncertainties correspond to degrees of confidence that are 
given to a value, whereas inaccuracies correspond to intervals 
of values that can be directly associated with measurement 
errors related to the method. For example, the uncertainty 
of measuring a geotechnical parameter identical to the one 
measured in a borehole increases with the distance to that 
point. The inaccuracy can for its part, be associated with the 
error bar of the result. The belief functions allow to take into 
account the ignorance and incompleteness of the information. 
It is indeed possible to grant credit on all the possible results 
in order to quantify the ignorance. For the reader eager to learn 
more, the theory is detailed in [18]. 

A Bayesian approach as part of a subjective probability 
approach [19] could have been considered for geophysical and 
geotechnical data combination. However, the main limitation 
of such an approach is that probabilities essentially represent 
uncertainty and only very poorly the level of inaccuracy. 
Moreover, in the probabilistic modeling stage, the different 
decisions (events) are only represented on singletons (i.e. 
single events) and are necessarily considered exhaustive and 
exclusive. The exclusivity is implied by the assumption of 
the additivity of probabilities. However, this hypothesis may 
be too strong and limit the representation of the knowledge. 
Furthermore, with a Bayesian approach, it is difficult to model 
the lack of knowledge or the knowledge that is not expressed 
in probability distributions. 

In order to define and to use the belief functions, it is 
necessary (i) to set a frame of discernment, (ii) to assign 
belief mass values to the events of this framework (Basic 
Belief Assignments - BBAs), (iii) to choose a fusion rule for 
combining information; and (iv) to represent the combined 
information. 

The Frame of Discernment (FoD) © is made of all the 
possible events about the problem under concern, the elements 
of the FoD are exclusive and exhaustive, so that for n events: 


© = {61,00,...,On}- (1) 


In the considered problematic, the possible events of the 
FoD correspond to intervals of values of geophysical and 
geotechnical parameters that can be associated with classes 
of geological materials (for example, 0;=clays, 2=sands, ...). 
The space of belief mass functions, the set of all subsets of 0, 
written 2°, is fixed by all the disjunctions and by the possible 
conflict between the sources of information (written () such 
that: 


as = {0, 01, 02, 94 U 02,03, 1 U 03, O2 U 03, 
6, U0 UO3,...,0; U03UGU..-UOn}. (2) 


As in the probability theory, the belief mass function m, is 
defined, for a source of evidence S; (for 7 = 1,2), attributed 
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to A (defined on 2°) in [0,1] such that the more m(A) tends 
to 1 and the more the confidence in A is important : 


S> m(A) = 1. (3) 


Ae2° 


The difference with the probability theory lies in the fact 
that A can represent the union of several events (for example, 
either 6; OR 2). It is therefore possible to model uncertainty 
and lack of knowledge. For instance, when no information 
is available about the achievement of an event member of 
O, one can set m;(OQ) = 1, avoiding the uniform distribution 
that would have been considered in a probabilistic scheme. 
Combination rules, as part of the belief functions theory, 
can thus take different levels of uncertainty and imprecisions 
into account according to the source of information. If only 
defined on singletons, the belief mass function is similar to a 
probability distribution. 

Smets fusion approach developed in his Transferable Belief 
Model (TBM) [20] (i.e. conjunctive fusion) allows the attri- 
bution of a mass of belief to the conflict, outside the FoD, so 
that (open-world assumption): 


my2(0) > 0. (4) 


Where ™m2() denotes the combined BBA resulting from 
the combination of information of sources | and 2. The belief 
mass resulting from the fusion of information from source | 
and 2 is written: 

2 


X,YCO| XNY=A 


And the level of conflict between the two considered sources 
of information can therefore be quantified by: 


mi(@) = 2 


X,YCe| XnNY=0 


m(X)mo2(Y), (6) 


with m,(X) and m2(Y) the belief masses respectively at- 
tributed to events X and Y by sources | and 2. 

According to Shafer’s approach and unlike Smets’ rule, 
Dempster-Shafer’s rule (DS) does not allow the attribution of 
a mass of belief to the conflict (closed-world assumption): 


mS (9) = 0. (7) 


The conflict is there reallocated through a classical normal- 
ization factor. The mass of belief in A, m4°(A), resulting 
from the fusion of information from sources | and 2 is written: 


» 2(Y). (8) 


X,YCO| XNY=A 


DS 1 
mig (A) = Tm) my(X)m 

The disadvantage of this method is that the conflict between 
the sources is no longer represented and it is possible to obtain 
counterintuitive results if the conflict is important because of 
this normalization. Even more problematic, even if the distinct 
sources are both informative whatever the level of conflict 
is, Dempster-Shafer’s fusion process can even not take into 
account the second source of information [21]. 


The disadvantage of this method is that the conflict between 
the sources is no longer represented and it is possible to obtain 
counterintuitive results if the conflict is important because of 
this normalization. Even more problematic, even if the distinct 
sources are both informative whatever the level of conflict 
is, Dempster-Shafer’s fusion process can even not take into 
account the second source of information [21]. 


mig (A) = mi2(A) 
my(A)?ma(Y) m2(A)?m1(Y) 
te De Peeveesis mia! * 
An¥=0 


B. Construction of BBAs from geophysical and geotechnical 
data 


Belief masses have to be assigned to each considered event 
of the FoD, for both sources of information. The combination 
of the belief masses can only be initiated after this stage. 
In the following, the geophysical source of information will 
be identified as source | and the geotechnical source of 
information as source 2. A 2D model assumption will be 
made, corresponding to the x and z spatial axes, since vertical 
sections of subsoil are considered. 


Geophysical data 


The discretization of the considered subsoil section, as well 
as the depth of investigation and the resolution, depend on 
the acquisition method used [24]. It is the user who sets, 
using the inversion tool used, the shape and dimensions of 
the discretization grid used. It is about starting from this 
discretization and being able to associate for each cell, masses 
of beliefs for each event of the FoD. 

The constitutive classes of the FoD are also fixed at the end 
of the inversion process by the geophysicist, with the help 
of a representation of the distribution of the set of inverted 
geophysical values, in the form of modal classes (Figure 1|.a). 
The representation in this form makes it possible to highlight 
the centers, minima and maxima of the events considered in 
order to be able to fix the bounds of the intervals associated 
with the events of the FoD. The number of cells of the subsoil 
section are represented according to the geophysical parameter 
values. The infima and suprema must be fixed so that the 
intervals are of the same width in order to avoid the appearance 
of a bias when calculating Wasserstein distances (detailed 
under). To associate the belief masses with the FoD events, the 
intervals of inverted values of the physical parameter (in red, 
Figure 1.b) are considered. For some geophysical methods, 
these intervals can correspond to the value obtained at the end 
of the inversion with its associated inaccuracy. 

It is then necessary to associate belief mass values mz, (.) 
corresponding to each element of 2°, for each cell of the 
inverted section. The masses are obtained from the calculation 
of Wasserstein distances [25], considering two geophysical 
intervals A = [a,,a2] and B = [by,b2] with A and B 
belonging to R, A being the interval corresponding to an event 
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Fig. 1. a) Model classes’ distribution of the geophysical parameter values from the considered subsoil section, allowing the selection of the geophysical classes 
in b). The red interval corresponds to an interval of inverted values, from one cell of a 2D section of subsoil, used for Wasserstein distances’ calculation. 


of the FoD and B being an interval of inverted values (Figure 
1.b), Eq. (10): 


2 


by +6 
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Z 2 
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This calculation estimates the distance between two inter- 
vals according to their size and the distance between them. 
The Wasserstein distances are calculated (using a logarithmic 
scale if the geophysical parameter requires it) between the 
inverted values with estimated inaccuracies, and the intervals 
associated with each event, chosen by the geophysicist. Each 
cell is finally associated with a standardized BBA respecting 
Eq. (3). This way, the more the distance of a geophysical 
interval resulting from the inversion is “close” to one event of 
the FoD, the more the mass of belief associated is important, 
and reciprocally. 


Geotechnical data 


For the geotechnical part, the information proposed during 
an investigation campaign is spatially punctual (in the x-z 
plane) and often contained in vertical soundings made from 
the surface. It is about associating masses of belief with the 
different events of the FoD for each cell of the considered 
vertical soundings. For this, the values proposed at each depth 
are considered with the associated inaccuracy, corresponding 
to the measurement error that could be attributed to the 
measuring device (Figure 2.a). Thus, as for the geophysical 
part, intervals of values are obtained. 

The geotechnical mesh consisting of as many cells in depth 
as the number of geotechnical values (Figure 2.b) is generated. 
A mass of belief m2(.) = 1 is assigned, in the drilling points, 


to the events corresponding to the measured geotechnical 
parameter. A value of | is set since we are very confident 
in the information inside the boreholes unlike the spatialized 
geophysical information. A new mesh is then constructed 
(Figure 2.c), according to the size and depth of the boreholes. 
In order to characterize the entire section of the model, as does 
the geophysical method, and to associate mass values to each 
new cell (BBA), an exponential lateral decay of the belief mass 
is imposed, from the drilling point to the nearest borehole so 
that the decay rate is a function of the values proposed by the 
nearby borehole. So that, for a specific depth, Eq. (11): 

BBA(az) = e~*°** BBA(0), (11) 
with x being the distance from the considered cell to the 
reference borehole (x = O in the borehole), k a decay factor 
fixed by the user to adjust the lateral decay rate, BBA(«) 
the belief mass values assigned to each event of the FoD 
for a position x, with BBA(0) = 1. C, corresponds to the 
coefficient of variation expressed in Eq. (12), such as used in 
[26]: 


1 € 
mesh — 1 Ya (Q _ Qi)? 
Cy — 4 -—= 


(12) 


where @ is the geotechnical value of the reference cell in 
the considered borehole and Q; the geotechnical value in 
the nearby borehole. For Figure 2.b, mmesh = 3 has been 
considered. If mmesh = 5 or 7, the computation of the C, 
will take into account 5 or 7 cells in the nearby borehole. 
Indeed, for two consecutive boreholes with similar values, 
at similar depth, the decay of the confidence is slower than 
for two consecutive boreholes presenting radically different 
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values. This decay of belief mass is carried out to the left and 
to the right, from each drilling. 


sD1 sD2 


Z(m) Beotethnical parameter values” Z(m) Z(m)t 


Fig. 2. Construction of a geotechnical discretization mesh from two vertical 
boreholes acquisition (SD1 and SD2). a) Representation of the geotechnical 
values for SD1 and SD2 according to the depth. b) The boreholes are divided 
in cells associated with belief mass equal to 1 for the considered event. c) 
Construction of a new mesh according to the size and depth of the boreholes. 


If, between two boreholes, the mass of belief associated with 
a hypothesis A is less than 1 (m2(A) < 1), then the remainder 
of mass to be allocated to satisfy Eq. (3), is reported on the 
proposition “any type of material” represented by the union 
of all events, such as Eq. (13): 


m(@1 U 02 U 63 U 04) = 1 — mo(A). (13) 


C. Dimensioning of the mesh prior to the fusion 


Each source of information imposes its own mesh but in 
order to combine the belief masses from the geophysical 
information source (source 1) and the geotechnical source 
(source 2), it is necessary to have a common mesh containing, 
for each cell, the geophysical and geotechnical BBAs. In order 
to not alter the quality of the information, no interpolation 
is carried out. It is decided to superimpose the geophysical 
discretization grid resulting from the 2D inversion to the 
geotechnical division, depending on the number and the bore- 
hole positions. Thus, an irregular mesh is obtained but without 
any approximation (Figure 3). 


Fig. 3. Example of a geophysical mesh (in black) and a geotechnical mesh 
(in red) superimposed to propose a new irregular mesh to carry out the 
combination calculations and present the fusion results. 


Ill. SYNTHETIC STUDY 


Below, a synthetic study based on artificial data is proposed 
in order to test this new proposed methodology. It is the 
opportunity to show the impact of different levels of noise 
on the geophysical information as well as the influence of 
the lateral decay factor k (Eq.11) on the results of the fusion 
in order to be able to choose a value for the use of such a 
methodology from real data. 


A. Considered methods 


For this study, the electrical resistivity tomography (ERT) 
method stands for the geophysical information source and the 
Cone Penetrometer Test (CPT) method for the geotechnical 
information source. 

The basic principle of DC-resistivity methods consist in 
injecting an electric current of known intensity [A] by means 
of two “current” electrodes and measuring a voltage [V] be- 
tween two “potential” electrodes. Depending on the electrode 
layout, the topography, the properties of the materials and their 
distribution, apparent resistivity values can be computed. The 
depth of investigation depends on the spacing of the electrodes, 
the configuration of the electrodes and the nature of the soil 
[27]. By generalizing this principle, a two dimensional (2D) 
ERT consists in aligning a series of electrodes and acquiring 
a large number of measurements based on four electrodes 
configuration. The apparent resistivity data acquired are then 
inverted using an inversion code or software to reconstruct a 
complete 2D-section of electrical resistivity [Q:m]. Here the 
Res2Dinv software (ver 3.71.118) [27] has been used. 

In order to obtain an artificial resistivity section of subsoil, 
a two steps procedure is followed. First, resistivity data are 
simulated using the Res2Dmod software [28], on the section 
that we want to consider. Second, apparent electrical resistivity 
values are inverted with Res2Dinv, considering a L1 norm [29] 
and an extended model discretization, to obtain the synthetic 
inverted section of electrical resistivity. 

The CPT method consists in pushing rods into the soil, at 
a constant speed, with a conical tip at the end [30]. This test 
is often used for the determination of the soils mechanical 
resistance properties. The two measured parameters are the tip 
resistance gq. [MPa] and sleeve friction f, [MPa]. Although the 
method uses two parameters, only gq. will be considered as the 
study parameter. 


B. FoD and considered model 


For this synthetic study, a two-layer model is considered, 
composed of materials that can be likened to silts for the upper 
layer and clays for the underlying layer. The FoD therefore 
contains three material class hypotheses, such as: 


O = {61, 2, 03}, (14) 


with 6, the event corresponding to the clayey material, #2 to 
the silty material and #3 to unknown materials. The latter is 
associated with the union of the geophysical and geotechnical 
value ranges that do not correspond to those associated with 6; 
and 0. This event #3 allows us in a certain way, to quantify 
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the lack of knowledge of the environment since it does not 
include the two first sets. The construction of the BBAs then 
consists in associating the data of the two considered sources 
to the events of the FoD. Figure 4 shows the two-layer model 
based on events from the FoD, used for this synthetic study. 


Om 24m 72m 95m 


Fig. 4. Representation of the events of the FoD in the imposed model of 
subsoil of the synthetic study. 


C. Construction of BBAs from geophysical and geotechnical 
data 


Geophysical data 


The electrical acquisition is simulated with a Wenner ac- 
quisition mode and with 96 electrodes interspaced from one 
meter. An electrical resistivity of 100 2-m is considered for the 
upper material and a resistivity of 30 0-m for the underlying 
one [31] (Figure 4). Electrical acquisitions are simulated with 
different noise levels (5, 10 and 15%). The results of this 
inversion (with 10% noise, figure 5.a) allow to highlight the 
presence of two layers but the interface between these two 
layers is not perfectly identified. A variation of thickness 
in the center of the model is visible. The interface is not 
straightforward and anomalies are present on the surface even 
though they are not part of the initial model. 

From these inversion results, it is possible to define the 
ranges of electrical resistivities that will be associated with the 
different events considered for the fusion process. A distribu- 
tion in modal classes is used to visualize the number of cells, 
in the discretized section of the 2D inversion, associated with 
specific range of resistivities (Figure 5.b). This distribution 
allows to highlight the two large material classes of the model. 
Thanks to it, the bounds of the considered events can thus be 
defined (in 2-m), so that the intervals have the same length 
(in logarithmic scale): 


6, = [25; 45], 
62 = [83; 149.4], 
63 = [13.89; 25[U]45; 83[U]149.4; 268.92]. 


(15) 


As explained in II.B., it is possible to associate belief masses 
with each cell of the mesh thanks to the values resulting 
from the inversion. As part of the construction of geophysical 
BBAs, the values presented Figure 6 are obtained. This figure 
highlights the association of the values of Figure 5.a with 
the events of the FoD, Eq. (15). The presence of a top layer 
(62) and a base layer (6) can be detected (Figure 6.a). It 
appears that there is a variation in the thickness of the layers 
in the center of the model, but the interface is not well 
characterized. Moreover, the intermediate values of electrical 
resistivity resulting from the inversion (Figure 5.a) between 


62 and @; layers induce the representation of a third material 
(63) which has no reality in the model that has been fixed. 
The belief masses are maximum when the resistivity values 
correspond to the center of the resistivity classes set for each 
event (Eq. (15)). 


Geotechnical data 


Concerning the source of geotechnical information, the 
simulation of four vertical CPT soundings inter spaced from 19 
meters is proposed (Figure 7). 20 cm wide and up to 15 m deep 
boreholes are considered, and a value of q. is recorded every 
50 cm from the surface. An inaccuracy of 10~? MPa on the 
measurements is considered. For a fixed normalized friction 
ratio of 3%, a value of gq. of 20 MPa is considered for the 
upper silty material and a value of 0.2 MPa for the underlying 
clay material, as proposed in the Robertson diagram [32]. 

In order not to have uniform values of qc for the materials 
and to try to represent the noisy reality of an acquisition in 
the field, values are drawn following a normal distribution 
defined for each event. Mean gq, values of 0.2 and 20 MPa 
are respectively used to define the normal distributions of the 
material classes. Standard deviation values equal to 10% of the 
mean values are associated, echoing the 10% noise used for 
the geophysical data. Keeping the minimum and maximum 
values, these random draws, make it possible to define the 
limits, in MPa, of the intervals associated with the elements 
(i.e. material classes) of the FoD: 


6, = (0.14; 0.27], 
Oo = (13.5; 23.5], 
63 = (0.1; 0.14[U]0.27; 13.5[U]23.5; 100). 


(16) 


The minimum and maximum values are fixed at 0.1 and 
100 respectively because they are the minimum and maximum 
values in Robertson’s diagram [32]. 

There are two types of sounding results according to 
their position (Figure 8). Once the values associated with 
the meshes of the sounding are obtained, it is possible to 
associate masses of belief to the whole section by extending 
the geotechnical information, as explained in II.B. In the 
framework of the construction of geotechnical BBAs and for 
k; = 0.1 (Eq.11), the obtained values are proposed in Figure 9. 
This figure highlights the fact that the confidence is maximum 
in the soundings. This method allows us to characterize the 
material 92 on the first 5 meters of the model and the material 
6, from 10 to 15 meters deep. 

The greater thickness of the material 62 in the center of 
the model is also well characterized. On the other hand, a 
great doubt appears in yellow (Figure 9.a) in certain areas, not 
allowing the determination of a specific material (6; U82U 63). 
For the model base area, this can be explained by the fact that 
the soundings stop at 15 meters depth. Regarding the areas 
between 5 and 10 meters to the right and left of the first and 
last sounding, this is related to the fact that for these two 
soundings, the closest soundings propose different values at 
the same depths, the confidence attributed to the presence of 
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Fig. 5. a) Subsoil section displaying inverted electrical resistivity values from 10% noise data acquisition and b) model classes’ distribution of the cells 
presented in a), according to the electrical resistivity values (Q-m). 
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Fig. 6. a) Representation of the event having the highest belief mass according to the BBA construction from geophysical data and b) the associated belief 
mass values, considering a 10% noise. The black lines represent the position of the interface. 


36 100 
= @ Ohm.m 


4.00 8.00 


Resistivity model 
44.0 48.0 52.0 56.0 


— 
0.2 20 MPa 0.2 20 MPa 0.2 20MPa 0.2 20MPa 


Fig. 7. 2D section of subsoil displaying true ER distribution with boreholes positions in black and associated tip resistance vertical profiles in white. 


6, therefore decreases very quickly laterally. This decay is also high towards the edges of the model because no other 
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Fig. 8. Examples of the two types of simulated soundings with tip resistance values according to the investigation depth. a) corresponds to borehole # 1 and 


4 on figure 7 while b) corresponds to borehole # 2 and 3 on figure 7. 
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Fig. 9. a) Representation of the events having the highest belief mass according to the BBA construction from geotechnical data and b) the associated belief 
mass values. The borehole positions are in dashed lines while the black lines represent the position of the interface. 


sounding is present at the ends to constrain the information. 


D. Effect of lateral decay factor and noise level on the fusion 
results 


We examine in this part the results of the fusion of belief 
masses established for the proposed synthetic model by vary- 
ing the noise level of the geophysical information, as well as 
the value of the lateral decay factor k (Eq.11) influencing the 
lateral decay rate of geotechnical information. 

Figure 10 shows the fusion results with different values of 
k (1077, 5.1077, 10-1, 5.107! and 1) for a simulated noise 
of 10% on the acquired geophysical information. Noise was 
set at 10% since the electrical resistivity classes of the FoD 
were defined from the modal classes of the inverted 10% 
noise image, Figure 5.b. For each value of k, Figures 10.a and 
10.b represent the results obtained by Smets fusion whereas 
Figures 10.c and 10.d represent the results obtained by PCR6 


fusion. While Figures 10.a and 10.c show the material classes 
having the greatest mass of belief at the end of the fusion 
process, Figures 10.b and 10.d correspond to the values of 
these respective belief masses, between 0 and 1. These figures, 
display the events (materials) potentially present within the 
section, as well as their attached level of confidence. 


The higher the value of & is, the higher the rate of confidence 
in geotechnical information is. This can be seen, for example, 
from the last borehole to the right end of the section (Figures 
10.b) or between the 2nd and 3rd boreholes (Fig 10.d). The 
increase of / implies that for two soundings offering similar 
values at the same depth, the confidence associated with the 
corresponding type of material will tend to decrease. On the 
other hand, for two soundings proposing different values at 
the same depth, the increase of & will hardly have any impact 
on the belief masses associated with the selected events (e.g. 
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Fig. 10. Representation of the events having the highest belief mass in a) and c) and their associated mass values in b) and d), considering a 10% noise. In 
i) k = 0.01, ii) k = 0.05, iii) kK = 0.1, iv) k = 0.5 and v) k = 1. Figures on the left side are results for Smets fusion while figures on the right side are 
results for PCR6 fusion. The sounding positions are in dashed lines while the black lines represent the position of the interface. 


between 5 and 10 m of depth between the boreholes | and 2, 
Figures 10.d). 


With regard to the material classes identified after the fusion 
process, the more /; increases, the more the quantity of conflict 
decreases (in red, Figures 10.a). This is explained by the fact 
that when there is little trust in the geotechnical data, there 
is little conflict with the geophysical data. In the meantime, 
an increase in the proportion of 03 is observed (Figures 10.c) 
close to the interface. This observation is explained by a larger 
mass attributed to the union of events and by geophysical data 
which propose intermediate values at the interface level. 


In the following of this work, an intermediate value of k will 
be retained, equal to 0.1. With such parameter value, a good 


confidence in information repeating between two successive 
soundings is obtained, but it also leaves room for doubt by 
having enough unknown material (63) at the interfaces. The 
obtained fusion results with different noise levels added to the 
geophysical information (5, 10 and 15%) are shown in Figures 
10.iii and 11 with & = 0.1. 


The greater the amount of noise is, the less clear the 
interfaces proposed by the inversion are (Figures 5.a, 11.1.a, 
11.ii.a). A greater number of anomalies are also present when 
the noise level increases. The noise level finally impacts the 
level of inaccuracy associated with the geophysical data used 
in the fusion process. Larger data inaccuracies induce wider 
value ranges considered for calculating Wasserstein distances, 
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Fig. 11. a) Subsoil section displaying inverted electrical resistivity values from i) 5% noise and ii) 15% noise data acquisition. Representation of the events 
having the highest belief mass in b) and d) and their associated mass values in c) and e). Figures on the left side are results for Smets fusion while figures 
on the right side are results for PCR6 fusion. The sounding positions are in dashed lines while the black lines represent the position of the interface. 


which in turn can bring to consider belief masses on more 
events of the FoD. 

Since the classes associated with FoD elements were fixed 
from the values with 10% noise (section II.C and Figure 
4.b), it is “reasonable” to have a higher confidence (higher 
belief masses) on these results than on the results with 5% 
and 15% noise (Figures Il.c, Il.e, 10.iii-b, 10.iii.d). The 
fusion process allows to override the noise effects, whether 
the noise level is 5 or 15%. This can be imputed to the 
computation of Wasserstein distances, taking into account the 
data inaccuracies and considering all geophysical classes. 


IV. SETTING UP A TEST BENCH FOR REAL GEOTECHNICAL 
AND GEOPHYSICAL ACQUISITIONS 


A. Materials 


In order to be able to assess the validity of the developed 
fusion methodology, two methods of data acquisition were 
retained: (i) a mini-ERT device acting as the geophysical 
source of information and (ii) a laboratory penetration cone 
acting as the geotechnical source of information. Before setting 
up the test bench, it was necessary to select the materials 
that could be put in place in a tank in order to carry out 
the study. This selection implies that the materials used meet 


several conditions in order to validate the methodology: they 
must have (i) distinct electrical resistivity ranges, (ii) distinct 
penetration depths and (iii) a certain homogeneity in the space 
to limit uncontrolled anomalous values. 

1) Mini ERT device: Expressly for the purposes of this 
study, a mini ERT device (Figure 12) has been set up. This 
device consists of 48 electrodes of 6 mm length, positioned at 
regular intervals of one centimeter. It can be moved along the 
test bench to make multiple acquisitions and to cover a longer 
section. 

2) Laboratory penetration cone: The laboratory penetration 
cone method is described in the French standard NF P 94-052- 
1 [33]. It consists in measuring a penetration depth of a cone, 
in millimeters, subjected to its own weight (Figure 13). The 
materials are tested individually, repeatedly, to determine an 
average value and a standard deviation of penetration depth 
for each material. These values can be used later in the study 
to simulate different drilling positions within the test bench. 
This method can be likened to the CPT method which is one 
of the most popular in situ geotechnical tests. 

3) Test bench and used materials: For the validation of 
the methodology, we wanted to build a test bench that could 
be easily set up and controlled, with two or three layers and 
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Fig. 12. Mini ERT device with 48 electrodes spaced 1 cm, adjustable height, 
used for electrical acquisitions in the test bench. 


Fig. 13. Laboratory penetration cone. 


variation of the interface positions. Fast-hardening natural fine- 
grained plaster as well as Hostun fine sand [34] are the retained 
constituents. These two materials meet the three conditions 
listed above. They were placed in a transparent PVC tank of 
100 x 30 x 17 cm? as shown in Figure 14 with an underlying 
layer of 5 cm of plaster (setting time = 69 h) overlaid by a 
layer of 2.5 cm of water saturated sand. 


Fig. 14. Transparency view of the test bench. 


A formwork was made during the placement of the plaster 
so that a 20 cm long anomaly could be inserted in. Saturated 
sand of 7.5 cm thickness is present instead of plaster. The 
contact between the materials and the bottom of the tank is 
at the origin of an interface that will be interesting to detect 
with the help of the methodology. 16 kg of plaster were mixed 
with 8 kg of water to obtain the material finally put in place. 
The electrical resistivity of the plaster was measured before 
and after the placement of the saturated sand to verify that 
the presence of water had a negligible impact on the electrical 
properties of the plaster. 


TABLE I 
VALUES OF ELECTRICAL RESISTIVITY AND DEPTH OF PENETRATION OF 
THE MATERIALS SET UP WITHIN THE TEST BENCH. 


Plaster before pluviation | Saturated Hostun sands 


Mean 31.28 78.15 
Standard deviation 3.23 11.18 
Number of measures 12 52 


Electrical resistivity ((2-m) 


Penetration depth (mm) 
Mean 

Standard deviation 1.61 

Number of measures 10 


17,31 


For the Hostun sand, 15.82 kg were pluviated in 5.8 kg of 
water, above the plaster to reach saturation. Trials had been 
carried out in advance to determine the proportions of water 
and sand required to achieve such a state as well as to validate 
the repeatability of such installation by pluviation. The values 
of electrical resistivities and penetration depths are displayed 
in Table I. 


B. FoD and BBA modeling 
1) FoD and target model: A FoD consisting of four ele- 


ments (material classes) is considered so that: 


O= {91, 02,03, O4}, (17) 


with 6; the element corresponding to the plaster material; 
82 corresponding to saturated sand; 63 corresponding to the 
hard and electrically insulating bottom of tank simulating a 
substrate and 64 corresponding to unknown materials, being 
the union of the ranges of values not corresponding to those 
associated with the 3 previously described materials. Figure 15 
presents the target model in the form of events constituting the 
FoD, following the disposition of the materials within the test 
bench. Although the tank used is 1 m long, the ERT acquisition 
only covered a 83 cm long section, on the central line of the 
model, and allowed us to image up to 18 cm of depth. 

2) Construction of BBAs from geophysical and geotechnical 
data: The electrical acquisition was carried out on 83 cm long, 
on the central line of the model, with a first acquisition on 47 
cm, and three next acquisitions done after respective displace- 
ments of 12 cm (roll along method). The results obtained from 
the inversion of the acquired data are displayed in Figure 16.a. 
These results make it possible to highlight the existence of 
three distinct sets, at depths relatively close to the target model 
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47cm 83cm 


Ocm 27cm 


Fig. 15. Scheme of the idealized section model (with vertical exaggeration), 
including the FoD constituent events associated with the materials of the test 
bench. 


(Figure 15) but presenting vertically slightly shifted interfaces, 
gradual rather than sharp. In addition, the variation in saturated 
sand thickness is poorly evaluated. Indeed, the anomalous zone 
is recognized but associated here, in its lower part, with values 
of electrical resistivities much larger than what they really are. 

The proposed values, although in the same order of magni- 
tude, do not exactly match the ranges of values measured on 
the materials independently (Table I). In order to characterize 
the events (materials) of the FoD, a distribution in modal 
classes (Figure 16.b) is used to visualize the number of cells 
of the discretized section for the 2D inversion, associated with 
their corresponding ranges of resistivities. This distribution 
makes it possible to highlight the three large sets of materials 
in the model. Thanks to it, the bounds of the events considered 
can thus be defined, in (Q-m, so that the intervals are the same 
length, as presented Eq. (18): 


[10; 35], 

[40; 140], 
(9500; 33250] 
64 = [2.85; 10[U]35; 40[U]}140; 9500[U]33250; 116375]. 


0, = 
62 = (18) 
63 = 


d 


In contrast to information from the geophysical source, 
geotechnical data were obtained beforehand by laboratory 
penetration cone testing, and then numerically simulated prior 
to fusion. Several simulations proposing various positions of 
survey points were carried out. In order to simulate drilling 
points, the associated mean depth values (mm) and associated 
standard deviations (Table I) were used to draw values, follow- 
ing a normal distribution defined for each event. An average 
penetration depth value of 0 mm is used for 63 (bottom of tank) 
and an associated standard deviation of 0.01 mm, meaning that 
negative values may be drawn. These random draws, make it 
possible to define the limits, in mm, of the intervals associated 
with the events of the FoD as presented Eq. (19): 


6, = (0.04; 0.19], 

6g = [13; 21], (19) 
63 = [—0.02; 0.02], 

64 = [—0.05; —0.02(U]0.02; 0.04[UJ0.19; 13[U]21; 100]. 


Thus, 2 mm wide boreholes are simulated, down to 15 cm 
and acquiring every 5 mm with an associated inaccuracy of 
0.01 mm. The values of penetration depth obtained can then 
be associated with the different materials of the model. 


V. TEST BENCH DATA FUSION RESULTS 


The results of the geophysical and geotechnical information 
fusion, are proposed in Figure 17. The simulations were 
carried out according to four distinct vertical drill positioning 
configurations, represented in dashed lines in the figures and at 
regular intervals: i) 8 holes inter-spaced of 10 cm (Figure 17.1) 
(x = 10; 20; 30; 40; 50; 60; 70; 80 cm), ii) 5 holes inter-spaced 
of 18 cm (Figure 17.11) (a = 4, 22, 40, 58, 76 cm), iii) 3 holes 
inter-spaced of 25 cm (Figure 17.iii) (a = 15, 40,65 cm) ), 
iv) 2 holes inter-spaced of 50 cm (Figure 17.iv) (w = 15,65 
cm). The fusion results carried out are presented, respecting 
i) the hypothesis of Smets (Figures 17.a and 17.b), ii) the 
hypothesis of a closed-world (section II.1) with PCR6 rule 
(Figures 17.c and 17.d). Figures 17.b and 17.d represent the 
belief mass values associated with events having the largest 
mass, represented respectively in Figures 17.a and 17.c. The 
fusion results are analyzed and discussed in the next section. 


VI. FUSION RESULTS ANALYSIS AND DISCUSSION 


Different rules of combinations 


Let us discuss and compare the results obtained by the 2 
different combination rules used in an 8-boreholes simulation 
(Figure 17.1). In the framework of a model as rich in 
geotechnical information, the section proposed by the PCR6 
method (Figure 17.1.c) is very close to the target model set 
up (Figure 15). The three sets are well characterized and 
the interfaces at 2.5 cm deep (sands-plaster) and at 7.5 cm 
deep (plaster-PVC tank and sand-PVC tank) are much better 
defined than by ERT alone (Figure 16.a). Moreover, thanks 
to this geotechnical information, the sand thickness anomaly 
could be correctly characterized as saturated sands (2) and 
not as a more resistive anomaly, in continuity with the 
insulating material from below, as suggested by the results 
of the inversion. The lateral extension of this anomaly is, 
moreover, well estimated (20 cm). The combination of Smets 
highlights the significant conflict existing between the two 
considered sources of information (Figure 17.1.a) concerning 
the two first layers. 


Whatever method is used, the presence of a hypothesis 4 is 
found at the vertical and horizontal interfaces (Figures 17.1.a, 
i.c). This hypothesis does not correspond to any material set 
up in the test bench. The belief masses attributed to such 
a hypothesis, highlight the transition zones not conform to 
reality, proposed by the inversion of the electrical resistivity 
data (Figure 16.a). In comparison to the belief masses as- 
sociated with the other hypotheses of the model, the belief 
masses associated with 4 are the lowest (Figures 17.1.b, i.d), 
showing that the confidence granted to such a material remains 
quite relative. An overall confidence drop is also observed 
from 15 cm depth. This corresponds to the maximum depth 
reached by the simulated boreholes. As confidence is extended 
laterally, the belief masses are constrained only by geophysical 
information to such a depth and therefore rely only on one 
source of information. 
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Fig. 16. a) inverse model 
presented in a), according 


resistivity section obtained by roll along acquisitions 
to the electrical resistivity values (Q-m). 


Influence of the number of boreholes and positions 


The first intuition would be to assume that the more the 
number of boreholes decreases, the more the method should 
be put in difficulty to properly characterize the section of the 
set up test bench. Although this is partly true, the quality of 
the results is not based as much on the number as on the 
positions of the drillings. Indeed, the anomaly of saturated 
sands contained between the two banks of plaster (Figure 
15) is as well characterized in terms of lateral extension with 
three or five soundings (Figures 17.ii.c, iti.c). It also has an 
equivalent associated trust (Figures 17.11.d, iii.d). It turns out 
that the belief masses associated with the event | (plaster) are 
even smaller for a fusion including three soundings (Figures 
17.1iii.d) than for a simulation of only two (Figures 17.iv.d). 

The explanation of such results lies in the fact that being in 
the presence of consecutive boreholes, informing about the oc- 
currence of different materials, at an equivalent depth, induces 
a rapid decrease in the confidence attributed to the boreholes. 
Therefore, more credibility is given to the geophysical infor- 
mation source, explaining the greater presence of 4, which 
reflects the gradual transitions in electrical resistivities. The 
masses associated with this event, however, remain relatively 
small. On the other hand, if two consecutive boreholes have the 
same geotechnical values, for a specific depth, the lateral decay 
rate will be low and no priority can be given to a different 
material existing between these two boreholes. That is why 
the sand anomaly in the center of the model does not appear 
in the results fusion with two soundings (Figures 17.iv.c) : no 


in the central line of the model and b) model classes’ distribution of the cells 


borehole pass through the anomaly and the geophysical source 
is unable to characterize this material as saturated sand. The 
strength of these results is that they suggest the presence of 
4 in this location, suggesting that the survey campaign should 
be reinforced (with a new borehole position for example). 

The conflict presented by Smets combination (results in 
Figures 17.i.a, i1.a, iii.a and iv.a) is neither a function of the 
number of geotechnical soundings. In this study, the cases of 
fusion bringing the highest amount of conflict are in fact the 
ones with eight and two soundings. Nor is it to be confused 
with a lack of knowledge of the subsoil. Conflict zones 
highlight contradictory information between the two sources. 
These zones are generally between two consecutive boreholes 
providing the same information, but going against the avail- 
able geophysical information. These are therefore potentially 
anomalous zones where the geophysical information must be 
considered carefully, in particular if the belief mass associated 
with the event retained after normalization is too low. 


Important considerations and potential in the application 


It is important to consider that the effectiveness of this 
fusion methodology has been assessed by comparing the 
fusion results with a target model (Figure 15). However, this 
remains an idealized representation of the test bench set up and 
could be, in some places, quite far from reality (real interfaces 
not perfectly horizontal or vertical, materials not perfectly 
homogeneous, 3D effects neglected, ...). The approach is dif- 
ferent from the one of the synthetic study (Figure 4) where the 
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Considered events 


eo 8 68 68 


Belief mass values 
0 05 1 


Fig. 17. Representation of the events having the highest belief mass in a) and c) and their associated mass values in b) and d). For i) 8 boreholes, ii) 5 
boreholes, iii) 3 boreholes, iv) 2 boreholes are considered. For each case, (a,b) figures are results of Smets fusion, while (c,d) figures are results of PCR6 


fusion. The borehole positions are in dashed lines. 


model shown corresponds to the true model. In order to control 
the effectiveness of the fusion methodology, it was envisaged 
to carry out ex post verifications of the constituent materials. 
Unfortunately, for practical reasons, this could not be done 
(reworking of materials modifying their physical properties, 
interaction with water, delicate cutting and extraction, ...). 
Regarding the fusion methodology developed, two aspects 
are debatable. First, the choice to set a mass of belief 
equal to | on the geotechnical information in the boreholes. 
Second, the effect of different random draw results on the 
fusion results. The choice of a maximum punctual confidence 
(m = 1) in boreholes is defended in order to give a full and 
local confidence to geotechnical information as it is currently 
done during investigation campaigns. Excessive risks are not 
taken since the test bench is relatively well known and the 
synthetic model is perfectly well known. Thus, it is sure 
that simulated borehole values refer to the right materials. 
Furthermore, a value of m=0.99 instead of m= 1, for 
instance, does not significantly change the results and does 


not change the interpretation and the resulting discussion. 
Regarding the effect of random draws, these draws were done 
following a normal distribution, the variations from one draw 
to another are minimal and the results of fusion differ little. 

Such an information fusion algorithm, dedicated to the com- 
bination of data from geophysical and geotechnical sources, 
should prove useful for processing of data acquired during 
investigation campaigns for many different kinds of issues. It is 
possible to envisage its use with a larger number of materials, 
but also, and especially, with a larger number of data types 
from geophysical methods (seismics, ground penetrating radar) 
and geotechnical testing methods (penetration cone, core sam- 
pling with laboratory identification, permeability tests, ...) 
associated. 

In the framework of a recognition campaign, the conflict 
zones, or zones with a low associated confidence, would make 
it possible to specify the locations where the investigation must 
be reinforced. The ultimate goal is to obtain a more robust 
and cost-effective diagnosis of the investigated structure, more 
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targeted for geotechnical investigation. This methodology has 
particularly shown its ability to correctly characterize inter- 
faces, which corresponds to areas where the risk of instability 
is potentially the greatest. For a levee embankment issue, for 
example, the results from such a methodology could come to 
feed into models of breakage risks (ex: CARDigues [35]). 


VII. CONCLUSION 


In this work, a new methodology has been presented, based 
on belief functions to take benefit and to combine two different 
and complementary kinds of information: geophysical and 
geotechnical. Each one having its own spatial distribution and 
related uncertainties and inaccuracies. A new representation of 
the information has been proposed, taking into consideration 
two different investigation methods, associated with degrees 
of belief. This representation is more informative than data 
superposition of different physical parameters. 

In the first place, this new approach has been validated 
with a synthetic study, simulating data acquired by ERT and 
a CPT method, considering a 2D model with two layers and 
thickness variation. The results were obtained with different 
noise ratios applied to the geophysical data and different values 
of lateral decay coefficient for the geotechnical information. 
The most appropriate value to pick up for the coefficient has 
been pointed out and it has been showed that this approach was 
able to manage the noise ratio, thanks to the use of Wasserstein 
distances. 

In order to address the problem of combining information 
acquired by geophysical and geotechnical methods during in- 
vestigation campaigns, and to acquire values from real devices, 
a test bench composed of plaster and saturated sands was set 
up. The methods used to characterize such a physical model 
were the ERT method (geophysical) and the laboratory pene- 
tration cone method (geotechnical). While the data has been 
acquired by a dedicated small scale ERT device, on the surface 
and on the central line of the complete model, borehole were 
simulated respecting the penetration depth ranges previously 
established. 

Fusion results were proposed following 2 combination rules 
(Smets and PCR6) as well as for four different simulations of 
number and positions of boreholes. The results highlighted 
the ability of this fusion approach to correctly characterize 
the test bench materials as well as to specify the positions of 
the interfaces (vertical and horizontal) between the materials. 
Moreover, for each result, thanks to a graphical representation, 
the associated confidence is proposed. 

Further research should include cases of material mix- 
tures and cases of different materials sharing common ranges 
of physical properties in order to test the ability of this 
methodology to differentiate them. We also wish to test 
this new methodology in real investigation campaigns in 
order to improve the available knowledge and strengthen the 
characterization. The level of confidence associated with the 
proposed results may be very relevant for decision support (eg 
models of failure hazards). The results of such a methodology 
should make it possible to propose the most relevant borehole 


positions (that are a function of conflictual and anomalous 
areas), in order to make the quality of the information more 
cost-effective. 
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Abstract—In this paper, a dual iris authentication using Dezert- 
Smarandache theory is presented. The proposed method consists 
of three main steps: In the first one, the iris images are segmented 
in order to extract only half iris disc that contains relevant 
information and is less affected by noise. For that, a Hough 
transform is used. The segmented images are normalized by 
Daugman rubber sheet model. In the second step, the normalized 
images are analyzed by a bench of two 1D Log-Gabor filters to 
extract the texture characteristics. The encoding is realized with 
a phase of quantization developed by J. Daugman to generate 
the binary iris template. For the authentication and the similarity 
measurement between both binary irises templates, the hamming 
distances are used with a previously calculated threshold. The 
score fusion is applied using DSmC combination rule. The 
proposed method has been tested on a subset of iris database 
CASIA-IrisV3-Interval. The obtained results give a satisfactory 
performance with accuracy of 99.96%, FAR of 0%, FRR of 
3.89%, EER of 2% and processing time for one iris image of 
12.36 s. 


Keywords: Biometric, Iris, Authentication, Dezert- 


Smarandache theory. 


I. INTRODUCTION 


When individuals log onto computers, or access an ATM, 
or pass through airport security, they have to reveal their 
identities. For this, individuals use passwords, ATM cards, and 
passports to prove their identities. However, passwords can be 
forgotten, and ATM cards or passports can be lost or stolen. 
In contrary, the biometric modalities (Fingerprint, face, iris, 
dietc) speak to what and they also allow to prove our identity. 

However, the unimodal biometric systems using one 
biometric modality for recognition cannot guarantee at 
present an excellent recognition rate. Furthermore, these 
systems suffer from limitations such as sensitivity to noise, 
data quality, non-universality, and spoof attacks. To overcome 
these problems, Multimodal biometric systems, which 
combine multiple biometric modalities, have been developed 
on purpose to achieve a better recognition rate. 


The popular fusion method of the biometric traits can be 
done at tow stages of recognition system: 


A. Fusion at feature extraction level 


The data is acquired from each sensor is utilized to generate 
a feature vector. Then, the features are fused to form one 
feature vector. 
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B. Fusion at matching score level 


The matching score of each system is combined and com- 
pared with the stored template. 

We use as a modality for recognition of individuals: iris, 
since their texture is 


- Stable throughout the life of a person, unlike the finger- 
print. 

- Unique for each person, unlike a facial feature in identical 
twins. 

- Unfalsifiable contrary to the characteristics of the voice. 

- Iris is an internal organ well protected from the external 
environment, but nevertheless measurable, in a rather 
little invasive way, by simple image acquisition. 


Daugman’s algorithm [1] is one of the best iris algorithm 
known in biometrics. The algorithm consists of segment iris 
using Integro-Differential Operator and iris normalization is 
implemented using Daugman’s polar representation. Then, 
iris encoding is applied using 2D Gabor filters to extract 
a binary code of 256 bytes. The Matching is processed by 
computing similarity between two iris codes using Hamming 
distance. The more Hamming distance is small, the more both 
codes are similar. A distance of 0 corresponds to a perfect 
match between both iris images, while two iris images of 
different person will have a Hamming distance close to 0.50. 
In 1997, Wildes [2] proposed a novel iris recognition system 
compared to Daugman algorithm [1]. The acquisition of iris 
is done by a CCD Camera in low luminosity. Then, the iris 
is segmented using Circular and Elliptic Hough transform and 
is normalized using a transformation function of pixels. After 
that, the iris is filtered by Laplacian of Gaussian filters with 
four different resolution levels. A normalized correlation is 
calculated for every resolution levels. The median of the values 
of correlations is computed for the filtered image. The fusion 
of four values is applied using Fisher’s linear discriminant. 

In 1998, W. Boles and B. Boashash [3], presented a new 
algorithm for recognition of individuals from iris images. The 
algorithm is insensitive to variation in the lighting conditions 
and noise levels. A Median filter is used for preprocessing. 
The advantage of this technique is to extract a features vector 
from 1D signals rather than 2D images analyzed in [1], [2] 
using zero-crossings of the dyadic wavelet transform at various 
resolution levels. Only a few selected intermediate resolutions 
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are used for matching. The matching is applied using different 
dissimilarity functions. Thus, make the algorithm faster and 
less sensitive to noise and quantification error. 


In 2004, Ma et al. [4] presented an efficient algorithm for 
iris recognition. The iris is segmented by Canny filter and 
Hough transform. Then, the iris is normalized by histogram 
equalization. After that, A 1D Wavelet Transform is used to 
represent resulting 1D intensity signals. The position of local 
sharp variation points is registered as features. The matching 
is effectuated using the similarity function (exclusive or oper- 
ation). This algorithm is efficient and faster than Daugman’s 
algorithm [1]. 

In [5], the researchers proposed a modified Masek approach 
and a comparative study of the performance of the following 
methods: radial segmentation, Masek segmentation approach, 
modified Masek approach. The proposed method tested on 
Casia Iris Database V3 showed a good performance in terms 
of accuracy and processing time. 

R. Biswas [6] has introduced an iris recognition system that 
includes different steps: segmentation, normalization, feature 
extraction, and classification. The segmentation of the pupil 
is performed by the Hough transform. Experimental results 
showed a recognition rate of 92%. 

In [7], the authors proposed an iris recognition system based 
on “Fractal dimension of the box-counting method”. First, the 
iris is segmented by Hough transform and is normalized by 
Daugman’s rubber sheet model. Then, the feature extraction 
is processed by box counting. Finally, the matching is es- 
tablished using K-nearest Neighbor and Euclidean distance. 
Experiments tested on Casia Interval V4 database showed a 
good recognition rate equal to 92.63%. 

D. Bobeldyk and A. Ross [8] have developed a method for 
predicting eye color from NIR iris images. Researchers have 
shown that a texture based approach based on the BSIF is more 
efficient than the intensity based approach based on raw pixel 
values. Experiments tested on the BioCOP database showed a 
good recognition rate of 90%. The BSIF distinguished “light 
color iris” and “dark color irides” using the SVM classifier. 

The authors [9] presented a new method of classifying 
faked iris images of different patterns such as printed irises, 
contact lenses. The new classification method learns different 
characteristics of faked iris images by CNN and identifies 
legitimate and faked iris images using “Hierarchical Multi- 
Class”. The tests carried out on the different databases: ND- 
contact, Casia-Iris-Interval and Casia-Iris-Syn, LivDet-Iris- 
2017-Warsaw showed a recognition rate equal to 100% and 
FAR = 0%, FRR = 0%. 

H.G. Daway et al. [10] presented a new method for detecting 
the pupil. The method involves several steps, the most impor- 
tant of which depends on the difference in color and intensity 
between the pupil and its neighborhood. These characteristics 
are very important to locate and extract the pupil. Thus, the 
pupil is a region of very high intensity (color) compared to its 
neighborhood. The experimental results showed a recognition 
rate equal to 100%. 


In purpose to improve overall performance in terms of 
recognition rate and mitigate errors, the researchers have used 
more than one biometric trait, and thus, the multibiometric 
systems have emerged. Numerous multi-biometric systems 
have been developed, which fusion is made at Matching score. 

In [11], the researchers proposed a new approach for 
recognition using both irises. The iris is segmented using 
the Canny filter and Hough transform, then the segmented 
iris is normalized by J. Daugman’s rubber sheet model. The 
iris feature extraction is carried out using convolution of the 
normalized iris with 1D Log-Gabor filters then the phase of 
filtered iris is quantized in order to generate a binary code. 
The Hamming distance is used for Matching. The Matching 
operation consists in comparing the two iris feature vectors 
of a person with the others; if the Hamming Distances are 
less than the threshold then the person is identified. The 
experimental results showed a good recognition rate equal to 
99.92% with an FRR = 9.96%, while for unimodal systems 
(left iris and right iris) the recognition rate is equal to 99.87% 
with an FRR = 14.62% and FPR = 15.68%. 

In [12], the authors presented the framework for multi- 
modal biometric fusion based on the uncertainty concept of 
Dempster-Shafer theory. A combination of quality measures 
and the accuracy of classifiers (equal error rate) are proposed 
to encode the uncertainty concept to improve the fusion. The 
proposed method revealed a good performance with an EER 
equal to 1%. 

R. Dwivedi and S. Dey [13] proposed a cancelable multi- 
biometric system using score level fusion. The fusion of scores 
was applied by MC weighting at first level and RA weighting 
at the second level. The comparative analysis shows that the 
proposed fusion method outperforms the existing weighting 
approaches. 

In this works, we extract only the interior half of the 
iris disc rather than the whole iris disc, which contains the 
most relevant information and it is less affected by noise. In 
addition, we combined two sources of information (left iris 
and right iris) with a high degree of conflict using Dezert- 
Smarandache Theory, which solves the problem of highly re- 
distributed masses conflicts arising under the Dempster Shafer 
theory. 

The reminder paper is organized as follows: the research 
method is described in section II, Results and Analysis are 
presented in section III, conclusions are provided in section 
IV. 


II. RESEARCH METHOD 


The key idea of our work is to extract only the dis- 
criminant information from iris texture and proposes Dezert- 
Smarandache Theory (DSmT) at score level fusion to operate 
under uncertainty in goal to achieve a good performance. 

The proposed method is composed of four main stages: 
preprocessing, feature extraction, fusion, and matching. 


A. Preprocessing stage 


First, the iris images require going through the preprocess- 
ing phase including segmentation and normalization. 
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1) Iris segmentation: The segmentation of iris is realized 
by commonly Edge detector method: Hough transform. 


- HOUGH TRANSFORM ALGORITHM 

- Generate edge map using the Canny filter. 

- Canny parameters: the standard deviation of Gaussian 
smoothing filter: o = 2; weighting for vertical gradients 
= 0; weighting for horizontal gradients=1. 

- Increase contrast in dark iris region. image gamma value: 
enhance the contrast of bright regions: y = 1.9. 

- Detect pixel corresponding to the local maxima Distance 
in pixel units to be looked at on each side of each pixel 
when determining, whether it is a local maximum or not: 
d= 1.5. 

- Binarize iris image using Hysteresis thresholding. Low 
threshold T, = 0.19. High threshold T> = 0.20. 


Increase in contrast 


Binary image 


Edge map 


Non-maxima suppression 


Figure 1. Different step of Hough Transform. 


Then, a Circular HoughTransform detects at first the 
iris/sclera boundary and the iris/pupil boundary. The eyelashes 
are detected by global thresholding (T’ = 100). 

In this work, the objective is to extract only relevant 
information from iris, which is represented by the structural 
variation of the iris texture (high gradient areas), only the 
internal half of the iris disc is exploited rather than whole, 
because it contains the most relevant information [14] and it 
is less affected by the noise as shown in Figure 2. Indeed, 
the proposed technique decreases the complexity and the 
computation load without losing information (as shown in 
Table I). 

(1) 


Thalf of iris dise = Tpupil = (Tiris = T pupil) /2 


Figure 2. Delimitation of only the internal half of iris disc. 


Table I 
COMPARISON: TOTAL IRIS DISC VS HALF IRIS DISC. 


Accuracy Processing time for one iris image 
(%) (s) 
Total iris disc 99.87 22.88 
Half iris disc 99.96 12.36 


From Table I, we denote that treatment using only half 
iris disc is more efficient with an accuracy of 99.96% and 
processing time for one iris image of 12.36 s than the treatment 
using a whole iris disc with accuracy 99.87% of and processing 
time of 22.88 s. 

2) Iris normalization: The iris disc does not always have 
the same dimension, even for eye images of the same person; 
this is due to various problems as follows: 

a) Different acquisition conditions of the eye images. Di- 
lation and contraction of the pupil due to the variation 
of the illumination level. 

b) The circles of iris and pupil are not concentric. 

In order to overcome these problems, a stage of normaliza- 
tion is applied. It consists of transforming the region of the iris 
disc to rectify the dimensions of all the iris discs, by using the 
homogenous rubber sheet model proposed by Daugman [1]. It 
transforms each point in the iris area to the polar coordinates 
(r,@), where r is on the interval [0,1], and @ is an angle in 
(0, 27], as illustrated in Figure 3. 


/ i (ey SAB \le 8 
a SS 8 \ q 
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Figure 3. Daugman rubber sheet model [1]. 


In our system, (20 x 240) points were used, but only 
(10 x 240) points corresponding to the internal half of the iris 
disc that contains the most relevant information and which is 
less affected by noise, are retained for the next steps of the 
processing, as shown in Figure 4. 


Figure 4. Normalization of the segmented iris. 


B. Feature extraction stage 


After that, the feature extraction stage is applied in purpose 
to extract the most discrimination information present in the 
iris region. For this reason, a bench of two | D Log-Gabor 
filter is used. 
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1) 1D Log-Gabor filter: 

- The Fast Fourier Transform is applied for each line of 
the normalized matrix image (FFT to 1D signals). 

- Then, the Inverse Fast Fourier Transform IFFT is applied 
on the multiplication FFT (1D signals) by a 1D Log- 
Gabor Filter. 

- The frequency response of a 1D Log-Gabor filter is given 


by: 
log(f/ fo)? ) 


2x log(a/ fo)” or 


G(f) = exp( 
Parameter setting: 
e A bench of two 1D Log-Gabor filters is used. 
e The standard deviation of the 1D Log-Gabor wavelet is 
given by o = 2. 
e The center frequency of the 1D Log-Gabor wavelet is 
given by fo = 0.05. 

Indeed, the phase of a filtered image was quantized using 
four-quadrants of Daugman [1], when going from one quadrant 
to an adjacent quadrant, one bit is changed as shown in Figure 
D: 


(0, 1} | (1, 1) 


(0, 0} 


[1, 0) 


Figure 5. Quantization Phase [1]. 


The encoding process produces a bitwise template contain- 
ing a number of information bits (as shown in Figure 6), the 
total number of bits in the template (9600 bits) will be the 
angular resolution (240) times the radial resolution (10), times 
2, times the number of filters used (2). 


Figure 6. Quantization Phase [1]. 


C. Matching stage 

The matching score comes before Fusion stage. It consists 
in comparing two iris code using Hamming distance. The 
Hamming Distance (HD) is defined by: 


N 
HD=)_X;@Y; (3) 
j=l 
where X,; and Y; are the two bitwise iris code, N is the 
number of bits in each iris code, and © is xor operation. 


Literally, the Hamming distance calculates the number of 
different and valid bits for the two iris code between X; and 
YR 

The number of translation bits that compensates the rotation 
of the iris needs to be fixed. We applied a translation of the 
iris code in an interval [-3,+3] bits. We take into consideration 
the minimum Hamming distance. 


D. Fusion stage 


In this stage, score level fusion using Dezert-Smarandache 
theory (DSmT) was applied on a goal to improve the perfor- 
mance of the dual iris system. 

1) Score level fusion: Matching score level fusion combines 
the scores generated by multiple classifiers relating to the left 
and right iris to affirm the veracity of the claimed identity. 

The Dezert-Smarandache theory operates under hyperpower 
set D®. Thus, DSmT is able to function properly not only 
with the unions but also with intersections. DSm Classic has 
combination rule [15],[16] and [17]: 

m(C)= S> mi(A)m2(B);4,B € D®,VC € D® (4) 
ANB=C 


Example 
O = {Stert, Srignt }, 


© = {0, Stet, Stights Stet U Srights Steft  Stight}, 


where 
(): Empty set; 
Stet: Hypothesis assuming that two individuals have same 
left iris; 
Sign: Hypothesis assuming that two individuals have a 
same right iris; 
Ste U Stight: Hypothesis assuming that two individuals 
have the same iris; 
Stet  Stight: Hypothesis assuming that two individuals 
have different iris. 


E. Decision 


The decision is made by fixing a threshold. The two irises 
compared will be considered as belonging to the same person 
if the calculated score is inferior to a threshold. 


III. RESULTS AND ANALYSIS 


A. Simulation environment 


The proposed method has been tested on a subset of iris 
database CASIA-IrisV3-Interval [18] in order to evaluate its 
performance in authentication mode. The subset contains 1180 
eye images of 118 individuals (classes), and each individual 
has five iris samples for the left eye and five iris samples for 
the right eye. 
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B. Performance metrics 


- False Reject Rate (FRR): also known as Type I error, is 
the measure of the probability that the biometric security 
system will incorrectly reject an access attempt by an 
authorized user; 

- False Accept Rate (FAR): also known as Type II error, is 
the measure of the probability that the biometric security 
system will incorrectly accept an access attempt by an 
unauthorized user; 

- EER (Equal Error Rate): The EER is the operating point 
for which the False Reject Rate (FRR) is equal to the 
False Accept Rate (FPR). 


C. Decidability 


Decidability [1] is the best metric which indeed takes into 
account the mean and standard deviation of the intra-class and 
inter-class distributions: 


y ls — bal 
oS ie a 
2 


Decidability d’ is a distance in standard deviations calcu- 
lated using (7), which is a function of the magnitude of the 
difference between the mean of the intra-class distribution ju, 
and the mean of the inter-class distribution fg, the standard 
deviation of the intra-class and inter-class distributions, o,, 
and oq respectively. 


Table II 
DECIDABILITY TABLE FOR VARIOUS NUMBERS OF BIT-SHIFTS. 
Numbers of shifts Ls Os Ld Cd d 
0 0.3300 0.0723 0.4914 0.0284 3.4314 
1 0.3137 0.0697 0.4860 0.0279 3.8149 
2 0.3072 0.0668 0.4812 0.0269 4.0264 
3 0.3044 0.0653 0.4772 0.0258 4.0742 
4 0.3032 0.0646 0.4738 0.0247 4.0431 
5 0.3028 0.0642 0.4709 0.0238 3.9907 
6 0.3025 0.0639 0.4684 0.0230 3.9362 
7 0.0637 0.0642 0.0223 0.0216 3.8862 
8 0.3023 0.0635 0.4645 0.0216 3.8303 
9 0.3022 0.0634 0.4629 0.0211 3.7999 
10 0.2758 0.0639 0.4643 0.0201 4.3960 
Ss 
£ 4] eee 
Ss 2 
ss 
Ss 1 
a 0 


Number of bits shifted in left and right 


Figure 7. Decidability curve for various numbers of bit-shifts. 


Using Equation (7), several different decidabilities are found 
out using 0-bit shift to 10-bit shift towards both left and right 
iris templates. 

The higher decidability is equal to 4.3960 at 10 bit shift 
(as shown in Table II and Figure 7) that guarantees good 
separation of intra-class and inter-class distributions, which 
allows for more accurate recognition. 


D. Score level fusion 


In fact, we calculated the fusion score Sf using Hamming 
distances obtained by comparing the individuals from their iris 


- HD ,: Hamming distance obtained by comparing the 
individuals from their left iris; 

- Sz: Score obtained by comparing the individuals from 
their left iris; 

- HDpR: Hamming distance obtained by comparing the 
individuals from their right iris; 

- Sp: Score obtained by comparing the individuals from 
their right iris. 


Algorithm 


for each individual indv 
for each different iris, j 


% such as i,j belongs to iris set of indv 


Calculate the score S_L (i,j)=1-HD_(L ) 
Calculate the score S_R (i,3j)=1-HD_R (i,j) 
Calculate the fusion of score 

S_f (1,3)=S_L (i,3)*S_R (i,j) 

k=1 

for s=0:0.05:1 

if S_f(i,j)< s then 

FN(k)=FN(k)+1 % false negative counter 
k=k+1 

end if 

end 

end 

end 

for each different individual indvi, indvj 


for each different iris (i,}) 
such as i belong to iris set of indvi 


and j belong to iris set of indvj 
Calculate the score S_L(i,j)=1-HD_L (i,j) 
Calculate the score S_R(i,j)=1-HD_R(i, }) 
Calculate the fusion of score 
S_f (1,3)=S_L (i, j)*S_R (i,j) 
k=1 
for s=0:0.05:1 
if S_f (i,j) >= s then 
FP (k)=FP(k)+1l % false positive counter 
k=k+1 
end if 
end 
end 
end 


maxindv= number of individuals 
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nbtr=number of iris images per individual 
nbinter=maxindvs (nbtrx (nbtr-1) /2) 
nbintra=maxindvs« (nbtrx(nbtr-1) /2) 
TN=nbinter-FP % True Negative 
TP=nbintra-FN % T Positive 
TPR=100«(TP/nbintra) % True Positive Rate 
TNR=100* (TN/nbinter) % True Negative Rat 
FAR=100* (FP/nbinter) % False Accept Rate 
FRR=100% (FN/nbintra) % False Reject Rate 
Accuracy=100« ((TP+TN) / (nbintratnbinter) ) 


oe 


2g 
2 


where x is the product operator of two numbers. 

The dual iris system using DSmC at score level fusion 
reaches an accuracy rate of 99.96% and FAR of 0%, FRR 
of 3.89%, EER of 2% as shown in Figure 8 and Figure 9. 


Accuracy(%) 


0 0.1 0.2 03 04 05 06 OF O8 09 1 
Normalized threshold 


Figure 8. The accuracy of the dual iris system. 
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Figure 9. FRR and FAR of the dual iris system. 


We conclude from Figure 10 that the ROC (Receiver Oper- 
ating Characteristic) of dual iris system using DSmC at score 
level fusion fit the origin, which proves the performance of 
our method. 


ROC curve 


FRR(%) 


0 10 2 30 40 50 60 70 80 90 100 
FAR(%) 


Figure 10. ROC of the dual iris system. 


E. Comparison of various approaches 


We denote from Table III, that the proposed dual iris 
authentication system gives a competitive performance with an 
accuracy of 99.96%, FAR of 0%, FRR of 3.89%, EER of 2% in 
comparison with other approaches. The problem of Dempster 
Shafer theory used in DST approach [12] (EER of 1%) that 
consists of combining two sources of information with a high 
degree of conflict is resolved. The proposed method is based 
on Dezert-Smarandache Classic rule (DsmC) that solves this 
problem. Iftakhar and al [11] used the fusion method based on 
the AND rule that gives an accuracy of 99.92%, FAR of 0%, 
FRR of 9.96%, which is more drastic and leads to improve 
the FAR. R. Dwindi and S. Dey obtained less performance 
in term of accuracy equal to 98.89%, EER of 0.69%, which 
used a score fusion methods likes MC weighting and RA 
weighting. These two methods using for optimization gives 
a little improvement to the performance of the system. 


IV. CONCLUSION 


The purpose of this work was to find out a dual iris 
authentication system that guarantees good performance and 
to make sure that there is no false acceptance rate, which 
promises useful security applications. The proposed method 
consists in segmenting, to normalizing, characterizing and 
encoding the iris. For the segmentation part, the detection 
of the iris/pupil circles was performed by Hough circular 
transform. Only the interior half of the iris disc containing 
the most relevant information and less affected by noise, which 
reduces time complexity was extracted. Iris normalization part 
was performed by the Daugman rubber sheet model with a 
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DECIDABILITY TABLE FOR VARIOUS NUMBERS OF BIT-SHIFTS. 


Tris Reco. System 
Iftakhar & Ashraful approach [11] 
DST [12] 

Dwivedi & Dey approach [13] 


Table IIT 
Accuracy (%) FAR (%) FRR (%) EER (%) 
99.92 0 9.96 - 
- - - 1 
98.89 - - 0.69 
99.96 0 3.89 2 


Proposed dual iris authentification system 


resolution of 10 x 240. This stage was analyzed by the bench of 
two 1D Log-Gabor filters to extract the texture characteristics 
and the encoding was realized with a phase of quantization 
developed by J.Daugman to generate the binary iris template. 
For the authentication and the similarity measurement between 
both binary irises templates, the hamming distances are used 
with a previously calculated threshold. The score fusion is 
applied using Dezert-Smarandache Classic (DSmC) rule. The 
experiment tested on Casia-iris v3-interval shows that the 
proposed system gives a good performance compared to others 
approaches with an accuracy of 99.96%, FAR of 0%, FRR of 
3.89%, EER of 2% and processing time for one iris image of 
12.37 s. 
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Abstract—This paper describes an original method of global 
machine condition assessment for infrared condition monitoring 
and diagnostics systems. This method integrates two approaches: 
the first is processing and analysis of infrared images in frequency 
domain by the use of 2D Fourier transform and a set of F- 
image features, the second uses fusion of classification results 
obtained independently for F-image features. To find the best 
condition assessment solution the two different types of classifiers, 
k-nearest neighbours and support vector machine (SVM), as well 
as data fusion method based on Desert Smarandache theory have 
been investigated. This method has been verified using infrared 
images recorded during experiments performed on laboratory 
model of rotating machinery. The results obtained during the 
research confirm that the method could be successfully used for 
identification of operational conditions that are difficult to be 
recognized. 


Keywords: classification, decision fusion, PCR6, infrared 
image analysis, Fourier Analysis, infrared thermography, con- 
dition base monitoring. 


I. INTRODUCTION 


Infrared thermography is a modern and popular technique 
for thermal condition monitoring of machinery, apparatus and 
industrial processes [1]. 

Infrared cameras can be used in continuous condition mon- 
itoring systems for contactless detection and identification of 
object faults at its early stage, which is useful for planing 
object maintenance and overhauls. 

Continuous condition monitoring system based on infrared 
device should include infrared image processing and recogni- 
tion to classify the current operation condition of the object. 
Research connected with the application and development of 
infrared image processing and analysis, as well as artificial 
intelligence methods, to continuous thermographic objects 
monitoring and diagnostics has been carried out in several 
different academic and research centres [2], [3] and also by 
the authors [4]. In this article an original method of object 
condition identification, which can be used in continuous con- 
dition monitoring and diagnostics systems, has been proposed. 

The method can be generalised to any diagnostic data 
acquired during continuous monitoring of different objects or 
industrial processes. 
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and reprinted with permission. 


Il. METHOD 


It has been assumed that the assessment of the general 
condition of an object could be determined on the basis of 
the analysis of infrared images that are acquired continuously 
by monitoring system during an object operation. 

For a clear description of the method, let us assume that 
diagnosed object is a complex machinery containing several 
sub-assemblies (e.g. motor, couplings, journal bearing, pomp, 
etc.). 

Having acquired an infrared image of machinery in any 
moment of its operation, it is possible to define regions of 
interests (ROIs) containing only important parts of the diag- 
nosed object. In such a way, the rest of the image content could 
be treated as an unwanted background that is not considered 
during the diagnostic process. 

In the proposed method, whose brief algorithm is presented 
in (Fig. 1), each defined region of interest contained a sub- 
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Figure 1: Idea of the method of identification of object 
conditions based on infrared images. 
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assembly of the machinery that could be treated as a kind of 
sub-image. Each sub-assembly in a different way reflects the 
machine’s conditions, thus analysis of the sub-images of sub- 
assemblies allows us to acquire partial diagnostic information 
about global conditions of an object. Process of analysis 
of each sub-image gives sets of features that represent the 
condition of each machine sub-assembly at the moment of 
its operation corresponding to the time of infrared image 
acquisition. The local conditions of the sub-assemblies are 
related to the machine’s global condition. 

Having determined the feature vectors for infrared images 
acquired during machine operation in different conditions 
(including faults), it is possible to design a set of local 
classifiers that allow us to identify conditions of the machinery. 
At this stage, the classifiers could be treated as local experts. 

Local diagnostic information provided by each classifier can 
be joined together to get information about global (overall) 
machinery condition. In the elaborated method, to aggregate 
diagnostic decisions and maximize final classification perfor- 
mance, application of decision fusion methods were used. 


A. Processing and analysis of infrared images 


The versatile nature of developed method allows us to 
apply different image processing and analysis methods to 
obtain a features set. For method verification purposes, the 
authors decided to use spectral representation of infrared 
images. Spectral representation of infrared images is obtained 
by use of the two-dimensional Fourier transform. One of the 
reasons of application of the 2D Fourier transform is a shift 
invariant property [5], which makes the method less sensitive 
to deviation in location of imagining device while observing 
an object. Spectral representation of infrared image could also 
emphasises diagnostic information that could be hidden in the 
real image. 

The result of Fourier transform of an infrared image is a 
two-dimensional spectrum, which could be represented by 
two images of magnitude and phase called also F-images. 
Frequency components on the F-images are distributed 
symmetrically and in many cases of the analysis it is enough 
to consider one quarter of the magnitudegrams and/or two 
adjacent quarters of the phasegrams. In most considered 
cases, the entire F-image is shown and analysed [4], [5]. 
This approach is most convenient for F-image interpretation 
purposes because frequency components generates specific 
symmetrically distributed patterns (similar to stars) (c.f. Fig. 
2), whose shapes and locations depend on a content of the 
original infrared image. 


To analyse the F-images, the three following features are 
defined: 
HF P Horizontal F-image Parameter, 
VFP Vertical F-image Parameter, 
CFP Circular F-image Parameter. 


Real image 


(a) Infrared image 


F-image of phase 


(b) F-image of magnitude 


F-image of amplitude 


(c) F-image of phase 


Figure 2: An exemplary infrared image (a) and its F-images 
of magnitude (b) and phase (c) obtained on the basis of 2D 
Fourier analysis. 
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The features are mean values of F-images frequency com- 
ponents calculated over rectangular and circular areas, placed 
in the centers of the F-images in the way presented in Fig. 
3. The dimensions of areas that were used to calculation of 
feature values were set experimentally (c.f. HI-A) 


(a) HFP 


(b) VFP 


Xx D 


(c) CPF 


Figure 3: Graphical illustration of considered features of F- 
images. 
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B. Classification of the machine’s conditions 


To classify machine operation conditions, a number of 
possible approaches could be chosen [6]. In practice, the 
choice of a classifier is a difficult problem and it is often based 
on a data specificity, as well as a researcher’s experience. 

The authors decided to apply two classifiers: a simple k- 
Nearest Neighbour (k-NN) classifier [7] and Support Vector 
Machine (SVM) [8], which is recognised as a very effective 
classification solution. 

The author’s intention was to show how to use the method 
and how the different classifiers behave. 

To obtain a reliable and certain classification efficiency, 
the Jeave-one-out cross-validation (LOOCV) algorithm [9] has 
been applied. 

The LOOCV validation method has a high variance but 
estimates of generalization error are comparable with other 
partitioning schemes used for classification efficiency evalua- 
tion [10]. 

The classifier accuracy measure that we used was the 
relative number of misclassification, which is calculated as 
follows: 

err = N./N, (1) 


where NV was the number of considered samples and Ne was 
the number of misclassified samples. On the basis of the 
err measure, the classifier efficiency was calculated in the 
following way: 


ef f =(1-err)- 100 %. (2) 


C. Decision fusion 


In the elaborated method, joining of the classification results 
is proposed. There are some methods which allow treatment 
of the data jointly [11]. One of the interesting approaches is 
a decision fusion. 

Decision fusion, which is also called classifier fusion, is the 
method that combines results of classification obtained from 
different classifiers trained over different types of data gathered 
from the same object. In this approach, classifiers are treated 
as “local experts”, who make decision about the machine’s 
condition. 

The use of classifiers in technical diagnostic is connected 
with the uncertainty of the data on which those classifiers 
are trained. The sources of uncertainty could take the fol- 
lowing form, for example, [12]: random events, measurement 
deviations, incompleteness of the set of considered diagnostic 
parameters and lack of knowledge about diagnosed object or 
process. 

In general, most types of uncertainty could be characterised 
by the use of classical probability theory based on the Bayesian 
theorem [13], [14]. 

An alternative to the Bayesian methods is the Dempster- 
Shafer Theory (DST), also called the mathematical theory of 
evidence. The DST can deal with imprecise or incomplete 
data. In addition, DST can be interpreted as a generalisation of 
probability theory where probabilities are assigned to multiple 
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possible events (e.g. sets of events) as opposed to mutually 
exclusive singletons [15], [16]. 

The DST theory offers very important mechanisms of 
information aggregation coming from multiple sources by the 
use of rules for combining evidences. A lot of rules have been 
developed since establishing the DST. 

Several interesting examples, including a detailed analysis 
of validity of Dempster’s combination rule in different con- 
texts, can be found in [17]-[19]. 

A generalisation and in some points an extension of the 
Dempster-Shafer evidence theory is The Dezert-Smarandache 
Theory (DSmT) [20] of plausible and paradoxical reasoning. 
DSmT overcomes some limitations of DST [20], [21] because 
it allows us to formally combine any kind of information. 
DSmT bases itself on similar terms as DST. The DSmT intro- 
duces the generalised frame of discernment ©, which contains 
n exhaustive elements (6),...,9,,). In the classification case, 
elements of © are all considered classes (class labels). 

On the basis of the generalised frame of discernment, hyper- 
power set D® can be created of all single class labels but also 
of allowed class labels logical combinations. This means that 
classification can not only be made for single classes but the 
tested sample can also be assigned simultaneously to several 
classes (6; 1 6; 4 0) or there can be some uncertainty in the 
reasoning process and the same test sample can be member of 
one or other class (0; U0; 4 9). Each of these combinations is 
called focal element. For each element of D®, a Generalized 
Basic Belief Assignment (GBBA) is possible. In other words, 
as the result of classification some belief is assigned for test 
sample «x that is a member of certain classes, several classes or 
there is some doubt to which class it should be assigned. From 
the formal site: m (.) : D° — [0,1] so GBBA can take values 
from 0 to 1, and if m, (A) = 1 there is 100% belief that test 
element x belongs to class A. In contrast, for empty set - e.g.. 
unknown class m (@) = 0. Belief assigned to all elements of 
D® should sum up to 1: }> 4<¢p6 m(A) = 1. This means that 
in the frame of discernment, D®© tested elements are for sure 
member of one of classes or class combinations defined by 
D®; so, no other unknown classes are allowed. 

Similarly to DST, the DSmT also allows to aggregate 
information with the use of combination rules. For this pur- 
pose, many combination rules have been elaborated [20], [22]. 
During the research, a PCR6 rule was used. The key idea of 
the PCR6 rule is to transfer the partial conflicting Basic Belief 
Assignment BBA proportionally to the individual BBA of non- 
empty elements involved in the conflict [23]. 


D. GBBA calculation 


The calculation of evidence is crucial for classifier fusion 
based on the methods demanding the BBA or the GBBA for 
each class [24], [25]. 

A simple method, which is ideal for research at the prelim- 
inary stage, has been developed for the evidence calculation 
from k-NN classifiers [26]. To obtain the output for a given 
sample, a set of distance measures to a number of known 
samples is calculated and it can be regarded as a class 


distribution. Identification of k nearest neighbours of a element 
x irrespective of class label is made. Then, the number of 
neighbours /; supporting assignment of element «x to class Ci 
is calculated. Accordingly, the GBBA function of class C; is 
calculated as follows [26]: 


m({Ci}) = ki/k (3) 


In case of SVM classifier, which unfortunately gives only 
class labels, the probabilities of class distribution were ob- 
tained applying extension introduced by Wu [27]. In the 
presented research, we deal with only one occurring condition 
at the time, therefore probabilities are very useful. It can be 
assumed that SVM classifier outputs are degrees of support for 
each class representing identified machine conditions. These 
outputs can be directly transformed into mass assignments: 
p; — m(i), where p; is the probability of condition 7 occur- 
rence and m(t) is the belief that condition i occurred provided 
by single SVM classifier on the basis of available evidence (in 
a form of feature space). 


III. METHOD VERIFICATION 


Our method verification considers several different aspects 
of the method’s application. First of all, verification should 
confirm that method can be useful in condition monitoring 
of machinery. The second important task of verification was 
to indicate what kind of classifier should be used and what 
is the best way to perform data fusion. To do this, we 
use two earlier described classifiers and compare results of 
classification obtained by use of single classifiers with results 
of classifier fusion, as well as results of classification obtained 
for multidimensional space of features. Investigation helps us 
to find the best solution to an answer the question: is fusion 
of simple classifiers a better solution than application of the 
classifier to single or multidimensional space of features? 

The method was verified on the basis of digital infrared 
images taken during diagnostic experiments. All of our com- 
putations were performed using Matlab 2007b software. 


A. Considered experimental data 


The experiments have been performed using a laboratory 
stand that consists of a laboratory model of rotating machinery 
and an infrared imagining system (Fig. 4). 

During the experiment, a sequence of 840 infrared images 
of resolution 320 x 480 pixels has been recorded. The thermo- 
graphic images have been taken every 30 s. The images have 
represented the machine operating in the conditions presented 
in the table I. 

For reference, condition S1 decided to record two times 
more images to make it easier to recognise by classifiers. 
It should be pointed out that conditions $2, S3 and S4 are 
difficult to distinguish and have been simulated intentionally 
to check whether it was possible to notice a small change in 
operational condition. Such small changes were also desirable 
for testing the ability of the classifiers to recognise nearly 
indistinguishable changes in the machine’s condition. 
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Table I: Description of conditions simulated during the experiment. 


Condition Id Description of fault No of acquired images 
S1 machine without faults 240 
$2 50% throttling of air pump 120 
S3 90% throttling of air pump 120 
S4 90% throttling of air pump and clearance of second bearing mounting 120 
S5 load of disk brake 120 
S6 faulty bearing no 2 120 


Figure 4: Visualisation of the laboratory stand. 1-frame, 2- 
motor (1.5 kW, 2500 rpm), 3-coupling, 4-bearings set no 
1, 5-shaft, 6-bearings set no 2, 7-break set, 8-air pump, 9- 
throttle valve, 10-infrared camera connected to PC, 11-motor 
controller. 


The infrared images acquired during the experiment have 
been pre-processed. The first step of the pre-processing was 
the selection of two Regions of Interest of size 20 x 30 pixels 
(ROI1 and ROI2) (Fig. 5). These ROIs represented the bearing 
housings. It was expected that changes in the machine’s 
condition would affect changes of bearing temperature and 
should be revealed in the infrared images. 


Figure 5: Infrared image of the operating laboratory stand, 
with marked ROIs of the first (left, ROI1) and the second 
(right, ROI2) bearings. 


According to the proposed method (c.f. II), sub-images 
corresponding to regions of interest (ROI1 and ROI2) were 
transformed to frequency domain using Fast Fourier Transform 
(FFT) algorithm. F-images (magnitude and phase) obtained 
after transformation was analysed and image features were 


calculated. Each infrared image was represented by 12 fea- 
tures, whose names were coded in the following way: 


Estimatorld_FIlmageType_ROIId 


e.g. HFP_P_R1 means that the value of the feature HFP was 
calculated for F-image of phase determined in ROI1. 

It is obvious that values of presented F-image features 
depend on dimensions and content of the region of interest 
(ROT), as well as type of F-image (magnitude and phase). To 
consider a variety in content of each type of F-images, each 
of the proposed feature could be fitted to the image content 
by setting a value of the feature parameter W, H, and D. 

To find the optimal values of F-image feature parameters 
W, H, and D an exhaustive search of feature space based on 
criterion of the maximum machine conditions classifier per- 
formance has been performed. Features have been calculated 
for each acceptable value of the feature parameters (from 1 to 
the maximal value Hiya, = 30, Wmax = 20, and Dmax = 20). 
Constrains followed from the size (20 x 30 pixels) of the 
considered F-images. 

For optimisation purposes, a k-Nearest Neighbour (k-NN) 
classifier was used. A number of nearest neighbours parameter 
was set to k = 10 according to recommendations presented in 
[28]. Classification efficiency was calculated in the way pre- 
sented in the theoretical background (c.f. II-B) and leave-one- 
out cross-validation (LOOCV) algorithm was used. Optimal 
values of feature parameters are presented in Tab. II. 


Table II: Optimal values of feature parameters and basic 
statistics of classification efficiencies. 


feat. feat. estimator estimator | mean 
num. name parameter | parameter eff 
name value [%] 

1 ROI_VFP_A H 20 59.6 

2 ROI2_VFP_A H 18 80.6 

a ROI1_VFP_P H 2 23.9 
4 ROI2_VFP_P H - 25.2 
5 ROI1_HFP_A WwW 29 56.9 
6 ROI2_HFP_A WwW 26 80.8 
7 ROI1_HFP_P WwW 9 28.3 

8 ROI2_HFP_P WwW 9 24.3 

9 ROI1_CFP_A D 14 62.3 
10 ROI2_CFP_A D 20 75.6 
11 ROI1_CFP_P D 6 37.1 
12 ROI2_CFP_P D 6 43.4 


The feature values, calculated using determined optimal 
parameters of F-image features, were data source for clas- 
sification of machine conditions. 
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B. Classification results for one and multidimensional feature 
space 


The first step of the method verification was assessment 
of application of one and multidimensional F-images feature 
space for purposes of classification of machine condition. As 
mentioned earlier, k-NN and SVM classifiers were applied. In 
case of k-NN classifier, a k = 10 neighbours was used. The 
Euclidean distance function was used as a distance metric 
in k-NN classifier. In case of SVM classifier, one-against- 
all strategy is implemented for multi-class classification. A 
Gaussian kernel was applied. Mean classifier efficiencies of 
considered machine conditions as a function of feature space 
dimension were shown in Fig. 6. As one can expect, classi- 
fication efficiency increase with size of feature space and for 
almost all conditions reach efficiency above 80% for size of 
feature space equal to 4 and more. 

A detailed analysis of maximal classifier efficiencies is 
presented in Table III. The results show that in case of 
conditions S1, S3, S4, S5 and S6 maximum efficiencies could 
be achieved for one dimensional space of feature vales for both 
types of applied classifiers. Values of maximal classification 
efficiencies are given in bold. The highest classification effi- 
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ciency values have been obtained on the basis of CFP feature, 
which indicates its usefulness in analysing the F-images. 
The greatest number of maximum classification efficiency 
(100%) was obtained using the SVM classifier. SVM gave 
the best results for features of F-images of phase whereas 
k-NN gave good results for F-images of magnitude within 
the region of interest ROI2. Region ROI2 covered more load 
bearing support, which affected its highest temperature and 
thus intensive infrared radiation. 


A plot of classification efficiencies for condition S2 pre- 
sented in Fig. 6 and values in Table III clearly show that 
condition $2 is poor recognizable. Analysis of classification 
efficiencies for condition S2 shows that application of one 
dimensional feature space allowed to obtain maximal effi- 
ciency equals 58.3% with application of k-NN classifier. The 
SVM classifier was unable to correctly recognize condition 
S2, where SVM allowed to obtain maximal efficiency equal 
8.3%. 


Looking at feature values distribution for condition $2 
presented in Figure 7, it is clear that SVM was unable to 
find proper global decision boundaries. Exemplary decision 
boundary for condition S2 vs all other conditions can be 
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Figure 6: Evolution of the belief in a cell crossed by an obstacle observed by a sensor. 
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Table III: Classification efficiencies obtained for individual F- 
image features 


Simulated machine conditions 
Feature space 


SI S2 S3 S4 S5 S6 
KNN(HFP_A_R1) | 625. 16.7 917 333 #917 O17 
SVM(HFP_A_R1) | 875 0.0 1000 0.0 83.3 917 
KNN(HFP_P_R1) | 542 167 83 25.0 25.0 167 
SVM(HFP_P_R1) | 875 0.0 1000 0.0 83.3 917 
KNN(HFP_A_R2) | 91.7 583 41.7 833 1000 917 
SVM(HFP_A_R2) | 83.3 83 100.0 833 100.0 83.3 
KNN(HFP_P_R2) | 50.0 16.7 83 83 16.7 83 
SVM(HFP_P_R2) | 100.0 0.0 0.0 0.0 0.0 0.0 
KNN(VFP_A_R1) ] 583. 83 1000 583 O17 O17 
SVM(VEP_A_R1) | 79.2 0.0 100.0 16.7 75.0 91.7 
KNN(VFP_P_R1) | 50.0 0.0 0.0 83 16.7. 16.7 
SVM(VEP_P_R1) | 100.0 0.0 0.0 0.0 0.0 0.0 
KNN(VFP_A_R2) | 91.7 583. 25.0 833 100.0 917 
SVM(VEP_A_R2) | 83.3 83 100.0 83.3 100.0 83.3 


KNN(VFP_P_R2) | 625. 0.0 167 


; 25.0 8.3 16.7 
95.8 0.0 0.0 0.0 


SVM(VEP_P_R2) 25.0 16.7 
KNN(CFP_A RI) | 875 16.7 917 0.0 917 833 
SVM(CFP_A_R1) | 100.0 83 100.0 25.0 75.0 83.3 
KNN(CFP_P_RI) | 625 0.0 0.0 83 75.0 16.7 
SVM(CFP_P_R1) | 100.0 0.0 100.0 75.0 0.0 100.0 
KNN(CFP_A_R2) | 91.7 583.750 91.7 100.0 91.7 
SVM(CFP_A_R2) | 83.3 0.0 100.0 91.7 91.7 83.3 
KNN(CEP_P_R2) | 625 50.0. 167 333 83 aL7 
SVM(CFP_P_R2) | 100.0 0.0 16.7 1000 0.0 58.3 


seen in Fig. 7. Taking into consideration the distribution of 
feature values for condition S2, the strategy of classification 
using SVM classifier with linear boundaries is insufficient 
to distinguish between S2 and other classes. Application 
of feature spaces dimensionality of 3-8 increase maximal 
classification performance of condition S2 which was, 
respectively, 83% for k-NN and 75% for SVM classifiers. 
Minimal space giving maximal classification performance 
with use of k-NN classifier was constructed with use of the 
following two sets of features: 

VFP_P_R2, HF P_A_R2, HF P_P_R2 and 

VFP_A_R1, HF P_P_R2,CFP_A_R1 
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Figure 7: Distribution of CFP_A_R2 feature for condition S2. 


To assess which classes are most similar a confusion matrix 
was prepared (Fig. 8). In each column there is percentage 
fraction of each class that was assigned to various predicted 
classes. Taking into consideration only single KNN classifiers, 
that were trained over 1D data set it can be seen, that condi- 
tions S2 and S3 are most difficult to distingush. It is connected 


with the way in which those conditions were simulated, when 
only degree throttling of air pump was changed. 
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Figure 8: Normalized confusion matrix for single kNN clas- 
sifier. 


IV. CLASSIFIER FUSION RESULTS 


Results of the classification of machine conditions shown 
in Section II-B, (Tab. IID) indicate that the proposed features 
of the F-images are useful for assessing machine conditions. 
For the majority of concerned machine conditions, it was 
possible to obtain the maximum classification efficiency on the 
basis of selected individual features of F-images. However, for 
condition $2, reliable condition assessment was not possible. 
To increase classification efficiency, fusion of classifiers was 
applied. We carried out an exhaustive computation considered 
all combinations of two, three and four k-NN and SVM 
classifiers of all considered F-image features. 

Mean classification efficiencies after classifier fusion as a 
function of number of fused classifier for all considered con- 
ditions are presented in Fig. 9. Table IV presents the highest 
classification efficiencies obtained for all condition after fusion 
of two, three and four combinations of different individual k- 
NN and SVM classifiers. Our results show that fusion of two 
classifiers is sufficient to obtain maximal classification almost 
for all conditions. Classifier fusion also allowed us to raise the 
highest classification efficiency for condition S2 by the 8.3% 
(from 58.3% to 66.7%) in comparison to the results obtained 
for the individual classifiers. The maximum efficiency of the 
classification was obtained as a result of the fusion of k-NN 
classifiers only. Fusion of the SVM classifiers does not ensure 
an increase of the classification efficiency for this class. 

The most interesting observation made after the analysis 
of classification performances is the lack of an increase of 
the efficiency for the condition S2 according to the number 
of fused single classifiers. This is caused by the presence 
of the classifiers that assign the high degree of belief to the 
wrong states. Fusing more than two classifiers did not cause 
an increase of the relative number of classifier combinations 
giving maximal performance. These results find confirmation 
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Table IV: Comparison of maximal classification efficiencies 
for all conditions after fusion different numbers of single KNN 
and SVM classifiers using PCR6 rule. 


Simulated machine conditions 

Fuzz. Class. # — Class. Type ST Sz 3 <4 <5 SG 
kNN(.) 100 66.7 100 100 100 100 

2 SVM(.) 100 8.3 100 100 100 100 
kNN(.) 100 66.7 100 100 100 100 

3 SVM(.) 100 8.3 100 100 100 100 
kNN(.) 100 66.7 100 100 100 100 

4 SVM(.) 100 8.3 100 100 100 100 


in [29], which showed that adding additional experts at some 
point leads to obtaining totally conflicted and useless classifier 
combinations. Analysis of classifiers combinations giving the 
highest performances indicates that they are composed from 
complementary rather than individually best performing clas- 
sifiers. Taking into account the obtained results, it can be con- 
cluded that the fusion of two selected classifiers is sufficient. 
In case of the considered data, a pair of classifiers assuring 
highest efficiency 66.7% was HF P_A_R2,CFP_A_R1. 


Taking into consideration the very good results of 
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classification obtained for multidimensional feature spaces 
decided to perform fusion of kKNN classifiers calculated for 
two dimensional feature spaces. As could have been expected, 
the results were very good (Table V). Maximal classification 
efficiency for condition S2 was increased to 83.3% for four 
following combination of classifiers and feature spaces: 
PCR6{kNN{VFP_A_R2,CFP_A_R2}, 
kNN{HFP_P_R2,CFP_A_R1}}, 
PCR6{kNN{HFP_P_R2,CFP_A_R1}, 
kNN{HFP_P_R2,CFP_A_R2}}, 
PCR6{kNN{HFP_P_R2,CFP_A_R1}, 
kNN{HFP_P_R2,CFP_P_R2}}, 
PCR6{kNN{HFP_P_R1,CFP_P_R1}, 
kNN{HFP_P_R2,CFP_A_RI1}}. 

Its worth mentioning that maximal classification efficiency 
using single k-NN classifier for condition S2 with the use 
of three and four dimensional (3D and 4D) space of feature 
was also 83.3%. The presented results confirm the ability of 
decision fusion algorithms to identify machinery conditions 
which are difficult to be recognised. In contrast, the SVM 
classifier results for the 2D feature space was maximally 
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Figure 9: Plots of mean classification efficiencies as a function of different number of fused classifiers for considered machine 


conditions. 
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62.5%. Accordingly, the increase of classification performance 
in comparison to single feature space is visible and in this 
the kNN classifier was proven to be better than the SVM 
classifier. 


Table V: Comparison of maximal classification efficiencies for 
all conditions fusion of 2 classifiers trained over 2D feature 
space. 


Class. T Simulated machine conditions 
ass. 'YPE ST S2___ 83. S4_~—~SOSS~C*S 


kNN(.) 100 83.3 100 100 100 = 100 
2 SVM(.) 100 62.5 100 100 100 100 


Fuzz. Class. # 


V. CONCLUSIONS 


In this paper, the method of object condition assessment 
using multiple classifiers fusion approach based on the gener- 
alised evidence theory is proposed. Fused classifiers have been 
trained over the data represented by three parametric spectral 
features of F-images. The F-images were the result of the 2D 
Fourier transform of infrared images acquired during object 
observation. During the research, optimal parameters of the 
features were evaluated and F-image features were computed. 
Based on the spectral features of the infrared images the 
classification process was performed. For comparison purposes 
k-NN and SVM classifiers were used. The results of the 
classification have shown that the proposed features of an F- 
image of thermograms could be useful for the evaluation of a 
machine’s condition. Circular Fourier Power (CFP) seemed to 
be suitable enough for the estimation of magnitude, as well as 
phase F-images. 

The proposed approach of classifier fusion is suitable for 
the assessment of machine global condition on the basis of 
pre-selected features of spectral infrared images. Classification 
efficiencies obtained using classifier fusion are higher than 
those calculated taking into consideration a single classifier. 
It must be mentioned that features chosen for the member 
classifiers in fusion process should be heterogeneous to assure 
high classification efficiency. Moreover, the increase of the 
number of considered ROIs should entail a reduction of the 
uncertainty of the information, which is used in the decision 
making about the machine’s global condition. Although the 
connection between diversity of features and the classification 
performance is not always straightforward, the analysis of the 
obtained results leads to the statement that in the considered 
case, the influence of feature heterogeneity degree on the 
fusion results is quite noticeable. 

The problem of high homogeneity of data could be resolved 
by classification of multidimensional space of homogeneous 
feature values and the next application of the fusion of such 
a classifier. This strategy was verified during the presented 
research and the obtained results confirmed the ability of clas- 
sifier fusion to increase classification efficiency of condition 
$2, which was difficult to recognise. 

It can be expected that conclusions made from the research 
could be generalised to data represented by other infrared 
image features and diagnostic signals. However, it needs 
further investigation. 
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Abstract—In this paper we present an application of a new Be- 
lief Function-based Inter-Criteria Analysis (BF-ICrA) approach 
for Global Positioning System (GPS) Surveying Problems (GSP). 
GPS surveying is an NP-hard problem. For designing Global 
Positioning System surveying network, a given set of earth points 
must be observed consecutively. The survey cost is the sum of 
the distances to go from one point to another one. This kind 
of problems is hard to be solved with traditional numerical 
methods. In this paper we use BF- ICrA to analyze an Ant Colony 
Optimization (ACO) algorithm developed to provide near-optimal 
solutions for Global Positioning System surveying problem. 


Keywords: Inter-Criteria Analysis, BF-ICrA, GPS surveying, 
PCR6, belief functions. 


I. INTRODUCTION 


In our previous work [1] we did apply classical Atanassov’s 
Inter-Criteria Analysis (ICrA) to examine some relations be- 
tween considered GSP’s and ACO algorithm performance. In 
this paper we consider a recent improved version of ICrA 
based on belief functions [2] and show how to apply it in same 
GSP problematic to revise and refine our previous analysis. 

After a short presentation of GSP problematic and ACO 
in the next section, and brief basics of BF in section III, we 
recall the classical Atanassov’s ICrA method in section IV and 
we present the new ICrA method based on Belief Functions, 
called BF-ICrA, in section V. In section VI, we show how to 
apply BF-ICrA for GSP problematic. Concluding remarks are 
given in Section VII. 


IJ. PRESENTATION OF ACO AND GSP PROBLEMATIC 
A. GPS surveying problem description 


GPS satellites continuously transmit radio signals to the 
Earth while orbiting it. A receiver, with unknown position on 
Earth, has to detect and convert the signals received from all of 
the satellites into useful measurements. These measurements 
would allow a user to compute a three-dimensional coordinate 
position: location of the receiver. Any GPS observation is 
proven to have biases, hence, in order to survey an appro- 
priate combination of measurement processing strategies must 
be used to minimize their effect on the positioning results. 
Differencing data collected simultaneously from two or more 
GPS receivers to several GPS satellites allows to eliminate 
or significantly reduce most of the biases. The GPS network 


349 


can be defined as set of stations (a1,a2,...@,), which are 
co-ordinated by placing receivers (X1, X2,...) on them to 
determine sessions (a1d2,@1@3,a2a3,...) among them. The 
problem is to search for the best order in which these sessions 
can be organized to give the best schedule. Thus, the schedule 
can be defined as a sequence of sessions to be observed 
consecutively. The solution is represented by linear graph 
with weighted edges. The nodes represent the stations and 
the edges represent the moving cost. The objective function 
of the problem is the cost of the solution which is the sum 
of the costs (time) to move from one point to another one, 
C(V) = >> C(ai, a;), where aja; is a session in solution V. 
For example if the number of points (stations) is 4, a possible 
solution is V = (a1, a@3,@2,a4) and it can be represented by 
linear graph aj — a3 — ag — ag. The moving costs are 
as follows: C(a1, a3), C(a3, a2), C(a2, a4). Thus the cost of 
the solution is C(V) = C(a1, a3) + C(as, a2) + C(ae, a4). In 
practice, determining how each GPS receiver should be moved 
between stations to be surveyed in an efficient manner taking 
into account some important factors such as time, cost etc. The 
problem is to search for the best order, with respect to the time, 
in which these sessions can be observed to give the cheapest 
schedule or to minimize CV). The initial data is a cost matrix, 
which represents the cost (time, or distance) of moving a 
receiver from one point to another. Solving such problems - 
GSPs - to optimality requires a very high computational time. 
Therefore, meta-heuristic methods are used to provide near- 
optimal solutions for large networks within acceptable amount 
of computational effort. In this paper, we consider the Max- 
Min Ant System (MMAS) meta-heuristic [3] and we present 
it briefly in the next subsection. 


B. Ant colony optimization for GPS surveying problem 


Real ants foraging for food lay down quantities of 
pheromone (chemical cues) marking the path that they follow. 
An isolated ant moves essentially at random but an ant 
encountering a previously laid pheromone will detect it and 
decide to follow it with high probability and thereby reinforce 
it with a further quantity of pheromone. The repetition of the 
above mechanism represents the auto-catalytic behavior of real 
ant colony where the more the ants follow a trail, the more 
attractive that trail becomes. 
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The ACO algorithm uses a colony of artificial ants that 
behave as cooperative agents in a mathematics space were 
they are allowed to search and reinforce pathways (solutions) 
in order to find the optimal ones. The problem is represented 
by graph and the ants walk on the graph to construct solu- 
tions. The solution is represented by path in the graph. After 
initialization of the pheromone trails, ants construct feasible 
solutions, starting from random nodes, then the pheromone 
trails are updated. At each step ants compute a set of feasible 
moves and select the best one (according to some probabilistic 
rules) to carry out the rest of the tour. The transition probability 
pij, to chose the node 7 when the current node is 7, is based 
on the heuristic information 7; and pheromone trail level 7;; 


of the move, where 7,7 = 1,....,n. 
B 
TEN: 

Pi = tt (1) 


Dgetieaed ue 

The higher value of the pheromone and the heuristic infor- 
mation, the more profitable is to select this move and resume 
the search. In the beginning, the initial pheromone level is set 
to a small positive constant value 7) and then ants update this 
value after completing the construction stage. ACO algorithms 
adopt different criteria to update the pheromone level. 

In our implementation we use MAX-MIN Ant System 
(MMAS) [3], [4], which is ones of the best ant approaches. In 
MMAS the main is using fixed upper bound 7,42 and lower 
bound Tin, of the pheromone trails. Thus accumulation of big 
amount of pheromone by part of the possible movements and 
repetition of same solutions is partially prevented. The main 
features of MMAS are: 

The aim of using only one solution is to make solution 
elements, which frequently occur in the best found solutions, 
get large reinforcement. Pheromone trail update is given by: 


Tig — PTig + ATij, (2) 
where 
1/C(Voest) if (é,7) © best solution, 
ATs = 
0 otherwise, 
and Vest is the iteration best solution and 7,7 = 1,...,n. 


To avoid stagnation of the search, the range of possible 
pheromone value on each movement is limited to an interval 
[Tmin; Tmax]. Tmax iS an asymptotic maximum of 7;; and 
Tmax = 1/(1 — p)C(V*), while Trin = 0.087Tmax. Where 
V* is the optimal solution, but it is unknown, therefore we 
use Vpest instead of V*. 

When all ants have completed their solutions, the 
pheromone level is updated by applying the global update 
rule. Only the pheromone corresponding to the best found 
solution is increased by the similar to the MMAS way. The 
global update rule is intended to provide a greater amount of 
pheromone on the paths of the best solution. It is a kind of 
intensification of the search around the best found solution. 

We use heuristic information equals to one over the cost of 
the session. 


III. BASICS OF THE THEORY OF BELIEF FUNCTIONS 


Let consider a finite discrete frame of discernement (FoD) 
© = {01,02,...,On}, with n > 1, and where 6; 6; = @ for 
i # j. The power-set of O (i;e. the set of all subsets of O) 
is denoted 2°. A basic belief assignment (BBA) associated 
with a given source of evidence is defined [5] as the mapping 
m(-) : 2° + [0, 1] satisfying m(@) = 0 and > 4-46 m(A) = 
1. The quantity m(A) is called the mass of A committed by 
the source of evidence. Belief and plausibility functions are 
usually interpreted respectively as lower and upper bounds of 
unknown (possibly subjective) probability measure [6]. They 
are defined by! 

a 


Bel(A) 4 
BCA,BEe22 


If m(A) > 0, A is called a focal element of m(-). When all 
focal elements are singletons then m(-) is called a Bayesian 
BBA and its corresponding Bel(-) function is homogeneous to 
a probability measure. Historically the combination of BBAs is 
accomplished by Dempster’s rule in Dempster-Shafer Theory 
(DST) [5]. Because of serious problems of Dempster’s rule’, 
we recommend the Proportional Conflict Redistribution rule 
no. 6 (PCR6) proposed by Martin and Osswald in [10] (Vol. 
3) which remains the most appealing alternative rule for BBA 
combination so far. 


m(B), and Pl(A)=1—Bel(A). (3) 


IV. ATANASSOV’S INTER-CRITERIA ANALYSIS (ICRA) 


Based on Intuitionistic Fuzzy Sets (IFS) [11], the Inter- 
Criteria Analysis (ICrA) has been introduced in 2014 by 
Atanassov et al. in [12], and then improved in [13], [15]. 
ICrA aims to identify the possible links between the criteria 
involved in a process of evaluation of multiple objects against 
multiple criteria. The aim of ICrA is to discover any existing 
correlations between the criteria themselves. Such analysis 
can permit (when possible) to reduce the complexity of 
large multiple criteria decision-making (MCDM) problems [2]. 
Until now the classical? ICrA has been applied in different 
fields: medicine [16], [17], optimization [18]-[21], workforce 
planning [22], competitiveness analysis [23], radar detection 
[24], ranking [25]—[27], etc. In this section we just recall the 
basic principles of classical ICrA. 

Let consider a set of alternatives (or objects) A 
{Aj, Ao,..., An} CM > 2), and a set of criteria C 
{Cy,C2,...,Cn} (N > 1). The available information is 
expressed by a M x N score matrixt S = [9;; = C;(Ai)], 
and (eventually) the importance factor w; € [0,1] of each 
criterion C’; with He w ; = 1. The ICrA method consists to 
build an N x N Inter-Criteria (IC) matrix K from the score 
matrix S. The elements of the IC matrix K consist of all In- 
tuitionistic Fuzzy (IF) pairs (415;,v;;’) € [0,1] x [0, 1] whose 


I> [I> 


'In the notations, the symbol £ means equal by definition. 

2that is: 1) insensitivity to the level of conflict between sources in some 
cases and dictatorial behavior [7], [8], and 2) inconsistency of Shafer’s belief 
conditioning [9] with bounds of conditional probabilities. 

3We refer Atanassov’s ICrA as the classical approach in the sequel. 

4also called benefit or payoff matrix in Multi-Criteria Decision-Making 
framework. 
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components express respectively the degree of agreement and 
the degree of disagreement between criteria C; and C;, for 
j.J' € {1,2,...,.N}. For a given column j (i.e. criterion C;), 
it is always possible to compare with >, < and = operators 
all the scores S;; for 7 = 1,2,...M because the scores of 
each column are expressed in same unit. The construction 
of IC matrix K can be used to search relations between 
the criteria because the method compares homogeneous data 
relatively to a same column. Atanassov in [14] prescribes° the 
normalization of the element 5;; of score matrix S by taking 


Sy = (Sy — SP*)/(SP* — SP) € [0,1], 

where 
oo =min{S1,,..., Say}, 
Sper = max{S1;,...,Sar;}. 


The construction of the NV x N IC matrix K is based on 
the pairwise comparisons between every two criteria along all 
evaluated alternatives. More precisely in [14] the degree of 
agreement between criteria Cj; and Cj j1;;’, and their degree 
of disagreement v;; are calculated by 


(5) 


2k", 2K%, 
Ly5! &£___W 7 and V5 5! £ 5 (6) 
M(M — 1) M(M —1) 


where It Wo be the number of cases in which the inequalities 
Siz; > Siz and Sj; > Sj 3 hold simultaneously, and Ky, be 
the number of cases in which the inequalities S;; > Sy; and 
Siz < Sirj hold simultaneously. 

By construction the IC matrix K is always a symmetric 
matrix. Atanassov provides explicit formulas in [14] for Wa 
and Ay, which depend on a particular choice of the signum 
function. Because of this the results of Kv ait and Ay, are 
disputable and that is why some authors [22], [28] propose 
other methods to calculate Ke and Ay, values for making 
the Inter-Criteria Analysis. 

Once the IC matrix K = [K,,;’] of intuitionistic fuzzy pairs 
is calculated one needs to analyze it to decide which criteria 
C; and Cj are in strong agreement (or positive consonance) 
reflecting the correlation between C; and Cj, in strong dis- 
agreement (or negative consonance) reflecting non correlation 
between C’; and C;:, or in dissonance reflecting the uncertainty 
situation where nothing can be said about the non correlation 
or the correlation between C; and Cj. 

At the beginning of ICrA development it was not very clear 
how these intuitionistic fuzzy (IF) pairs (15;/,Vj;;") had to 
be used and that is why Atanassova [29], [30] proposed to 
handle both components of the IF pair. For this, she interpreted 
pairs (t;;",;;") as points located in the elementary TFU 
triangle, where the point T of coordinate (1,0) represents the 
maximal positive consonance (i.e. the true consonance), the 
point F’ with coordinate (0, 1) represents the maximal negative 
consonance (i.e. the falsity), and the point U with coordinates 
(0,0) represents the maximal dissonance (i.e. the uncertainty). 
From this interpretation it becomes quite easy to identify the 


5 Although this normalization is not very necessary in fact for ICrA making. 
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top of consonant IF pairs (f1;;/,/;;’) that fall in bottom right 
comer of (J’FU) triangle limited by vertical line from x-axis 
x = a, and horizontal line from y-axis y = 3, where a and 6 
are two ad-hoc threshold values in [0, 1]. The set of consonant 
IF pairs are then ranked according to their (Euclidean) distance 
dé, Cy with respect to T point of coordinate (1,0) defined by 


6,0, = U(1,0), (15574557) = yf (Lh - Hayy)? + HF D 


It is worth noting that y;;, and vj; values are in fact linked 
with belief function through the following formulas 


Beljy (0) = pyyr (8) 
Ply (0) = 1 — v557, (9) 
Uj 4) (0) mS PL; ;' (A) = Bel,;/ (9) = 1 a V54! = L yj’, (10) 


where 6 means: the criteria C; and Cj, are totally positively 
consonant (i.e. totally correlated), whereas 6 means: the crite- 
ria C; and Cj are totally negatively consonant (uncorrelated). 
The FoD is defined as © + {6,0}. U,j-(0) represents 
the dissonance (the uncertainty about the correlation) of the 
criteria C; and Cj. From this, one can easily define any BBA 
1105 41 (0), M597 (0) and TIL 5 41 (0 U 0) of 2° by taking 


mjg/(8) = Hay", (11) 
53" () = 59"; (12) 
m3 (8 U8) = 1 = pgy — 159. (13) 


Remark 1: The construction of the Inter-Criteria Matrix K 
is not unique and depends on the choice of algorithm of 
construction of y;;, and v;;, (and the choice of the signum 
function) as reported in [28]. This can yield different ICrA 
results in general. 
Remark 2: The construction of j4;;- and vj; appears to be 
only a crude approximation of true values because they are 
only based on counting the valid ’>” or ”<” inequalities. In 
fact, their calculations do not exploit how bigger and how 
smaller the scores values are in each comparison done. So it 
yields a lack of precision on estimation of jj; and v;;" values. 
ICrA can be very useful for verification of algorithm 
correctness. When the optimization problem have a lot of 
constraints with ICrA we can find if some of the constrain 
is subconstrain of some other and to exclude it. With the help 
of ICrA we can divide constraints to two or more groups, more 
sensitive and less sensitive and to solve problem first according 
more sensitive constraints and later to less sensitive ones. 
To circumvent the aforementioned drawbacks, we present 
succinctly in the next section a new ICrA approach based 
on belief functions which is presented in more details with 
examples in [2]. 


V. A NEW ICRA METHOD BASED ON BELIEF FUNCTIONS 


The new Belief Function based ICrA method, called BF- 
ICrA for short, presented in this section improves Atanassov’s 
ICrA. It provides a more precise construction of jj; and 1; ;/ 
values because it exploits all available information included in 
the score matrix. 
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BF-ICrA starts with the construction of an IM x N BBA 
matrix M = [m,,(-)] from the score matrix S = [S;;]. The 
elements m;; of the BBA matrix M are obtained as follows 
- see [31] for details and justification. 


mij (Ai) = Belij(Ai), (14) 
maj(As) = Belij(A;) =1— Plij(As), (15) 
mij(A; U Ai) = Plij(Ai) — Beli; (Ai), (16) 
where® 

Belij(Ai) = Sup;(Ai)/Abrax: (17) 
Belj;(Ax) & Inf;(Ai)/A in (18) 

with 
Sup; (Ai) & [Siz — Seg, (19) 

kE{1,-..M}| Sn <Siy 
Inf (Ai) = — oe [Sig — Sel, 20) 
kE{1,...M}|S%5>Sij 

and 
Atnax = max Sup;(As), (21) 
Ajnin = min Inf (Ai). (22) 


For another criterion C’, and the 7’-th column of the score 
matrix we will obtain another set of BBA values m,;/(-). 
Applying this method for each column of the score matrix we 
are able to compute the BBA matrix M = [m,,(-)] whose each 


component is in fact a triplet (m;j(Ai), mij (Aj), mij (A; U 


A;)) of BBA values in [0,1] such that m,;(A;) + mij(Ai) + 
mij(A; U A;)) = 1 for all i=1,..., M and j =1,...,N. 


The next step of BF-ICrA approach is the construction of 
the N x N Inter-Criteria Matrix K = [K,,] from M x N 
BBA matrix M = [m,,;(-)] where elements A’,;, corresponds 
to the BBA (mj; (0),m,;,"(8),mj;"(8 U @)) about positive 
consonance @, negative consonance @ and uncertainty between 
criteria C’; and Cy, respectively. The construction of the triplet 
K 55° = (m4jr (0), m5; (8), m5; (9UA)) is based on two steps: 

« Step 1 (BBA construction): Getting m’,,,(.). 


For each alternative A; for i = 1,...,M, we 
first compute the BBA (mj,,(8),mji,(@),mji(@ U 
0)) for any two criteria j,j’ € {1,2,...,N}. For 
this, we consider two sources of evidences (SoE) in- 
dexed by j and j’ providing the BBA mj; and mj; 
defined on the simple FoD {A;,A;} and denoted 
may = [mij(Aa), maj (Ai), mij(Ai U Ai)] and may = 
[M;" (Aj), Mig! (Aj), M497 (A;UA,)]. We also denote O = 
{0,0} the FoD about the relative state of the two SoE, 
where @ means that the two SoE agree, 9 means that they 
disagree and 6 U 6 means that we don’t know. Hence, 
two SoE are in total agreement if both commit their 
maximum belief mass to the same element A; or to 


an # 0. Tf Abax = 0 then 
= 0 then Plz (Ai) = 1, 


assuming that Adax -# O and Al, 
Belj; (Aj) = 0; and if Al 


min 


the same element A;. Similarly, two SoE are in total 
disagreement if each one commits its maximum mass 
of belief to one element and the other to its opposite, 
that is if one has m,;(A;) = 1 and mij(A;) = 1, or 
if m;;(A;) = 1 and m,;/(A;) = 1. Based on this very 
simple and natural principle, one can now compute the 
belief masses as follows: 


mi (9) = mi (Aa)mig’ (Aa) + mij(A)ry (A), 23) 
14 41(8) = maz (Ai)migr (Ai) + mig (Ai) (Ai), 24) 
m5 q(8U 8) = 1 — mj (0) — m5j(9) (25) 


msn (@) represents the degree of agreement between the 
BBA mjj;(-) and mij/(-) for the alternative A;, m‘,,(@) 
represents the degree of disagreement of the two BBAs 
and mi...(@U 9) the level of uncertainty (i.e. how much 
we don’t know if they agree or disagree). By construction 
mig (*) = Mig), Mig (8), M59 (8), mj y-(8U8) € (0, 1] 
and mi,(0) + mi,(@) + mi,(@U@) = 1. This BBA 
modeling permits to build a set of M symmetrical 
Inter-Criteria Belief Matrices (ICBM) K’ = [Ki,,] of 
dimension N x WN relative to each alternative A; whose 
components Ki j correspond to the triplet of BBA values 
Mi = (mM5(8),m5;(@),m5;/(@ U @)) modeling the 
belief of agreement and of disagreement between C’; and 
Cy based on Aj. 


e Step 2 (fusion): Getting mj; (.). 


In this step, one needs to combine the BBAs mjj/(.) for 
7 = 1,...,M altogether to get the component Kj;, = 
(m5; (8), m4” (0), m5; (8U8)) of the Inter-Criteria Belief 
matrix (ICBM) K = [K;,;’]. For this, we recommend to 
use the PCR6 fusion rule [10] (Vol. 3) because of known 
deficiencies of Dempster’s rule. Because of computational 
complexity of PCR6 fusion rule when M/ becomes large, 
one may prefer to approximate the fusion result by 
using the simple averaging rule. Simple Matlab™code 
for PCR6 rule can be found in [32] for convenience. 


The computational complexity of BF-ICrA is of course 
higher than the complexity of ICrA because it makes a more 
precise evaluation of local and global inter-criteria belief 
matrices with respect to IF inter-criteria matrices of ICrA. The 
overall reduction of the computational burden of the original 
MCDM problem thanks to BF-ICrA depends highly on the 
problem under concern, the complexity and cost to evaluate 
each criteria involved in it, as well as the number of redundant 
criteria identified by BF-ICrA method. 

Once the global Inter-Criteria Belief Matrix ICBM) K = 
[F557 = (m4 (0), m5 5° (8), m4,r (0 U 0))] is calculated, we 
can identify the criteria that are in strong agreement, in 
strong disagreement, and those on which we are uncertain. 
For identifying the criteria that are in strong agreement, we 
evaluate the distance of each component of AK, with the 
BBA representing the best agreement state and characterized 
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by the specific BBA’ m7(@) = 1. From a similar approach 
we can also identify, if we want, the criteria that are in 
very strong disagreement using the distance of m,j/(-) with 
respect to the BBA representing the best disagreement state 
characterized by the specific BBA mp(@) = 1. We use the 
dgr(.,.) distance presented in [33] for measuring the distance 
d(m1, mz) between the two BBAs® m (-) and mo(-) over the 


same FoD. It is defined by 


I> 


Ne: S) d,(Bh(X), Bh(X)), (26) 
XE2° 


dgr(m1,m2) 


where the Belief-Intervals are defined by BI,(X) = 
[Bel,(X), Ply(X)] and Blg(X) © [Belo(X), Plo(X)] and 
computed from mj (.) and mga(.) thanks to formula (3). 
dw(BI,(X), BIz(X)) is Wassertein’s distance between in- 
tervals calculated by 


dw ([a1, 61], [a2, b2]) = 


a, +b, a2 +be 
2 2 


2 
| + 
and N, = 1/2!°!-! is a normalization factor to get 
dgr(m1,mz2) € (0, 1). 


VI. APPLICATION OF BF-ICRA TO GSP 


In this section, we analyze the experimental results obtained 
using MMAS algorithm described in the previous section. For 
this, we use real data from Malta and Seychelles GPS networks 
composed of 38 sessions and 71 sessions respectively denoted 
GSP1 and GSP2. We use also 6 larger test problems range 
from 100 to 443 sessions denoted GSP3,..., GSP8. The results 
are obtained by performing 30 independent runs, for every 
experiment. The details of our MMAS implementation are 
given in [1]. So in our GSP example we consider 8 GSP 
criteria C; = GSPi, i = 1,...,8 and six average costs as 
results A;, ..., Ag, where A, is the cost average for the first 
5 runs, Ag the cost average for the first 10 runs, A3 for the 
first 15 runs), ...and finally Cg for all the 30 runs. Table I 
shows the values of averaged costs obtained for this problem. 
It corresponds to the transpose of the score matrix S. 


Aq Ag A3 Ag As Ag 
OT = GSP1 399.00 898.00 398.33 398.50 399.40 399.50 

C2 =GSP2 916.40 915.60 922.47 924.80 924.72 922.07 

C3 =GSP3 4133640 —41052.40 0991.93 40935.90 —40832.20 —-40910.60 
Cy =GSP4 3244.80 3303.30 3327.00 3344.55 3345.60 3341.93 
C5 =GSP5 1656.20 1660.80 1663.93 1664.95 1666.96 1665.90 
C6 =GSP6 1673.60 1683.50 1690.73 1688.75 1690.24 1692.67 
C7 =GSP7 — 3420.00 3430.70 3433.13 3426.85 3429.44 3428.57 
Cg =GSP8 3758.20 3755.70 3758.73 3760.50 3760.80 3765.80 


Table I 
TRANSPOSE OF THE SCORE MATRIX S = [Sj;] OF GSP PROBLEM. 


7We use the index T in the notation m-(-) to refer that the agreement is 
true, and F’ in mp(-) to specify that the agreement is false. 

SHere we will take mi(-) = m,jr(.) and mo(-) = mr(-), or ma(-) = 
mr(-) 


Hence in this problem M = 6 and N = 8, and S = [Sj,] is 
a 6 x 8 score matrix. Based on classical ICrA approach, one 
gets the following IC matrices? 


C1 Co CZ C4 Cy Ce Cy Cg 


Cy 7 1 0.60 0.27 0.67 0.73 0.67 0.33 0.877 

Cy 10.60 1 0.27 0.80 0.73 0.53 0.47 0.73 

C3 10.27 0.27 1 0.07 O 0.20 0.40 0.13 
KH = Ca |0.67 0.80 0.07 1 0.93 0.73 0.53 0.80 
= Cs 10.73 0.73 0 0.93 1 0.80 0.60 0.87 
Cg |0.67 0.53 0.20 0.73 0.80 1 0.67 0.80 

C7 |0.33 0.47 0.40 0.53 0.60 0.67 1 0.47 

Cg 10.87 0.73 0.13 0.80 0.87 0.80 0.47 1 J 

C1 Co C3 Ca Cy C6 Cr Cg 

Ci 7 0 0.40 0.73 0.33 0.27 0.33 0.67 0.13] 

Co 10.40 0 0.73 0.20 0.27 0.47 0.53 0.27 

Cz |0.73 0.73 0 0.93 1 0.80 0.60 0.87 
KY — Ca [0.33 0.20 0.93 0 0.07 0.27 0.47 0.20 
= Cs 10.27 0.27 1 0.07 O 0.20 0.40 0.13 
Cg |0.33 0.47 0.80 0.27 0.20 0 0.33 0.20 

C7 |0.67 0.53 0.60 0.47 0.40 0.33 0 0.53 

Cg 10.13 0.27 0.87 0.20 0.13 0.20 0.53 o J 


The element Kv an of matrix K“ expresses the degree of 
agreement between criteria C; = GSP; and Cj, = GSP;:, 
whereas the element /’y;, of matrix K” expresses the degree 
of disagreement between C; = GSP; and Cy = GSPj. 
Based on these results, one sees that ACO algorithm performs 
similarly for GS P2, GSP, GSP; and G'S Pg because they are 
all in high agreement. Indeed ju; values for j, j’ € {2,4,5, 8} 
are quite high (greater than 70%). They are GPS networks with 
different numbers of sessions, but may have a similar structure, 
therefore, the value of agreement is high. For other networks, 
we can conclude that they have very different structure. What 
is worth noting is that there appears also a strong agreement 
of GSP1 with GSP8 because ji1g = 0.87. But because GSP8 
is also in strong agreement with GSP2, GSP4, GSP5 and with 
GSP1 it is logically expected that GSP1 should be also in 
agreement with GSP2, GSP4, GSP5, which is unfortunately 
not the case based on this classical ICrA. This example points 
out some inconsistency of ICrA result because of the too 
crude method of estimation of the degree of agreement and 
disagreement between criteria based on IFS. 


Now if we consider the same example with the same score 
matrix S (built from Table I), we obtain the following IC Belief 


matrices!° 

C1 Cog C3 Gy Cs Cx Cr Cg 
C 1 [0.9098 0.6732 0.1791 0.5968 0.6106 0.5620 0.1659 0.77897 
C2 | 0.6732 0.9546 0.0364 0.8983 0.8783 0.8341 0.5532 0.7016 
C3 | 0.1791 0.0364 0.8722 172 0.0154 0.0178 0.0366 0.1137 
C4 | 0.5968 0.8983 0.0172 0.9552 0.9146 0.9163 0.7395 0.6092 
C5 | 0.6106 0.8783 0.0154 146 0.8917 0.8778 0.6922 0.6315 
Ce | 0.5620 0.8341 0.0178 0.9163 0.8778 0.9060 0.7630 0.6441 
C7 | 0.1659 0.5532 0.0366 0.7395 0.6922 0.7630 0.8587 0.2484 
Cg 10.7789 0.7016 0.1137 0.6092 0.6315 0.6441 0.2484 0.8508 J 
C1 C2 C3 C4 C5 C6 C7 Cg 
Cy, [0.0207 0.1941 0.5385 0.2578 0.1757 0.2117 0.5335 0.03997 
Cz | 0.1941 0.0166 0.8323 0.0486 0.0298 0.0513 0.1808 0.0682 
C3 | 0.5385 0.8323 0.0117 0.9002 0.8754 0.8548 0.7062 0.5486 
C4 | 0.2578 0.0486 0.9002 0.0187 0.0216 0.0204 0.0606 0.1193 
Cs | 0.1757 0.0298 0.8754 0.0216 0.0170 0.0201 0.0558 0.0832 
Ce | 0.2117 0.0513 0.8548 0.0204 0.0201 0.0154 0.0390 0.0726 
C7 | 0.5335 0.1808 0.7062 0.0606 0.0558 0.0390 0.0110 0.3495 
Cg 10.0399 0.0682 0.5486 0.1193 0.0832 0.0726 0.3495 0.0100 J 


K(0) = 


K(@) = 


From ICBM K(6) and K(@) we compute the matrix D(@) 
of distance of m,,-(.) to the full agreement state with BBA 


°For presentation convenience and due to typesetting column width, we 
decompose et present the IC matrix K = [Kjj; = (Ky Ke) into two 
distinct matrices K4 = [Ky] and KY’ = [Ky]. 

10Ror presentation convenience, the ICBM K = LK ij! 
(m5 (@), m, 57 (8), m5’ (8U))] is decomposed into three matrices K(@) = 
[K},, = m5;'(9)], K(9) = [K},, =m,,;()] and K(@U@) = [Ke a 
1—m,;" (0) — m, 5 (8)]- 
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mp(0) = 1 based on dgz(.) distance. We get the following 
distances to full agreement 

D9) = [Dj yr = dBr(my 5). m7)] 

C1 C2 C3 C4 C5 Ce C7 Cg 

Cy 0.0590 0.2633 0.6845 0.3331 0.2892 0.3314 0.6893 0.1406 
Ca 0.2633 0.0321 0.8987 0.0767 0.0803 0.1135 0.3230 0.1950 
Cz | 0.6845 0.8987 0.0774 0.9418 0.9306 0.9192 0.8381 0.7241 
C4 | 0.3331 0.0767 0.9418 0.0326 0.0566 0.0552 0.1706 0.2668 
C5 0.2892 0.0803 0.9306 0.0566 0.0679 0.0770 0.1958 0.2404 
Cg | 0.3314 0.1135 0.9192 0.0552 0.0770 0.0592 0.1494 0.2293 
C7 | 0.6893 0.3230 0.8381 0.1706 0.1958 0.1494 0.0849 0.5626 
Cg 10.1406 0.1950 0.7241 0.2668 0.2404 0.2293 0.5626 0.0892 


The element Dj; represents the agreement distance be- 
tween C’; and C’',, the lower the better. From the values of 
elements of D(@) matrix one sees clearly that ACO performs 
similarly for GSP2, GSP4 and GSP5 because distances Do4, 
Do5, and D45 are very small. Also we see that GSP6 is also 
in good agreement with GSP4 and GSP5 but is relatively less 
in agreement with GSP2 because Dog = 0.1135. As we see 
there is no inconsistency in this new BF-ICrA method with 
respect to what provides classical ICrA because with BF-ICrA 
we have a much better and precise estimation of degrees of 
agreement and disagreement between criteria for making the 
analysis thanks to a proper belief functions modeling. 


VII. CONCLUSION 


The GPS surveying problem and a new InterCriteria Ana- 
lysis based on belief functions were addressed in this paper to 
overcome the potential inconsistencies of the results generated 
by the classical ICrA method. This technique proposes a 
more precise and refined method for estimating the degree of 
agreement and disagreement between criteria which use the 
whole information available in the data. Instances containing 
from 38 to 443 sessions have been solved using MMAS 
algorithm and we did compare the performance of ACO 
algorithms applied to eight GPS networks. Our results shows 
that ACO can provide fast near-optimal solution for observing 
GPS networks, and could help to improve the services based 
on GPS networks. From this new Inter-Criteria Analysis we 
are able to identify some relations and dependences between 
the considered eight GSPs and MMAS algorithm performance. 
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Abstract—In this paper we propose a new Belief Function- 
based Inter-Criteria Analysis (BF-ICrA) for the assessment of 
the degree of redundancy of criteria involved in a multicriteria 
decision making (MCDM) problem. This BF-ICrA method allows 
to simplify the original MCDM problem by withdrawing all 
redundant criteria and thus diminish the complexity of MCDM 
problem. This is of prime importance for solving large MCDM 
problems whose solution requires the fusion of many belief 
functions. We provide simple examples to show how this new 
BF-ICrA works. 


Keywords: Inter-Criteria Analysis, ICrA-BF, MultiCriteria 
Decision Making, MCDM, belief functions, information fu- 
sion. 


I. INTRODUCTION 


In a Multi-Criteria Decision-Making (MCDM) problem 
we consider a set of alternatives (or objects) A 
{Aj, Ao,..., An} (M > 2), and a set of criteria C 
{C,, Co,...,C nw} (N > 1). We search for the best alternative 
A* given the available information expressed by a M x N 
score matrix (also called benefit or payoff matrix) S = [S;; = 
C;(A;)], and (eventually) the importance factor w; € [0, 1] of 
each criterion C; with en w; = 1. The set of normalized 
weighting factors is denoted by w = {wy,we,..., wy}. 
Depending on the context of the MCDM problem, the score 
Si; of each alternative A; with respect to each criteria Cj 
can be interpreted either as a cost (i.e. an expense), or as 
a reward (i.e. a benefit). By convention and without loss of 
generality’ we will always interpret the score as a reward 
having monotonically increasing preference. Thus, the best 
alternative A? for a given criteria C; will be the one providing 
the highest reward/benefit. 

The MCDM problem is not easy to solve because the 
scores are usually expressed in different (physical) units and 
different scales. This necessitates a choice of score/data nor- 
malization yielding rank reversal problems [1], [2]. Usually 
there is no same best alternative choice A* for all criteria, 
so a compromise must be established to provide a reasonable 


I> [I> 


‘because it suffices to multiply the scores values by —1 to reverse the 
preference ordering. 


and acceptable solution of the MCDM problem for decision- 
making support. 

Many MCDM methods exist, see references in [3]. Most 
popular methods are AHP? [4], ELECTRE® [5], TOPSIS* [6], 
[7]. In 2016 and 2017, we did develop BF-TOPSIS methods 
[3], [8] based on Belief Functions (BF) to improve the original 
TOPSIS approach to avoid data normalization and to deal 
also with imprecise score values as well. It appears however 
that the complexity of these new BF-TOPSIS methods can 
become a bottleneck for their use in large MCDM problems 
because of the fusion step of basic belief assignments required 
for the implementation of the BF-TOPSIS. That is why a 
simplification of the MCDM problem (if possible) is very 
welcome in order to save computational time and resources. 
This is the motivation of the present work. 

For this aim we propose a new Inter-Criteria Analysis 
(ICrA) based on belief functions for identifying and estimating 
the possible degree of agreement (i.e. redundancy) between 
some criteria driven from the data (score values). This permits 
to remove all redundant criteria of the original MCDM prob- 
lem and thus solving a simplified (almost) equivalent MCDM 
problem faster and at lower computational cost. ICrA has 
been developed originally by Atanassov et al. [9]-[11] based 
on Intuitionistic Fuzzy Sets [12], and it has been applied in 
different fields like medicine [13]-[15], optimization [16]- 
[20], workforce planning [21], competitiveness analysis [22], 
radar detection [23], ranking [24]-[27], etc. In this paper we 
improve ICrA approach thanks to belief functions introduced 
by Shafer in [28] from original Dempster’s works [29]. We 
will refer it as BF-ICrA method in the sequel. 

After a short presentation of basics of belief functions in 
section II, we present Atanassov’s ICrA method in section 
III and discuss its limitations. In Section IV we present the 
new BF-ICrA approach based on a new construction of Basic 
Belief Assignment (BBA) matrix from the score matrix and 
a new establishment of Inter-Criteria belief matrix. In section 


? Analytic Hierarchy Process 
3ELimination Et Choix Taduisant la REalité 
“Technique for Order Preference by Similarity to Ideal Solution 
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V a method of simplification of MCDM using BF-ICrA is 
proposed. Examples are given in VI with concluding remarks 
in Section VII. 


II. BASICS OF THE THEORY OF BELIEF FUNCTIONS 


To follow classical notations of the theory of belief func- 
tions, also called Dempster-Shafer Theory (DST) [28], we 
assume that the answer (i.e. the solution, or the decision to 
take) of the problem under concern belongs to a known finite 
discrete frame of discernement (FoD) 0 = {61,02,...,An}, 
with n > 1, and where all elements of © are exclusive. The 
set of all subsets of © (including empty set @ and ©) is the 
power-set of © denoted by 2°. A BBA (or mass function) 
associated with a given source of evidence is defined [28] as 
the mapping m/(-) : 2° — [0,1] satisfying m(@) = 0 and 
do ac2e M(A) = 1. The quantity m(A) is called the mass of 
A committed by the source of evidence. Belief and plausibility 
functions are usually interpreted respectively as lower and 
upper bounds of unknown (possibly subjective) probability 
measure [29]. They are defined by? 


> 


BCA,BEe22 


If m(A) > 0, A is called a focal element of m/(-). When all fo- 
cal elements are singletons then m/(-) is called a Bayesian BBA 
[28] and its corresponding Bel(-) function is homogeneous to 
a probability measure. The vacuous BBA, or VBBA for short, 
representing a totally ignorant source is defined as m,(Q) = 1. 
The main challenge of the decision-maker consists to combine 
efficiently the possible multiple BBAs ms(-) given by s > 1 
distinct sources of evidence to obtain a global (combined) 
BBA, and to make a final decision from it. Historically the 
combination of BBAs is accomplished by Dempster’s rule 
proposed by Shafer in DST. Because Dempster’s rule presents 
several serious problems (insensitivity to the level of conflict 
between sources in some cases, inconsistency with bounds of 
conditional probabilities when used for belief conditioning, 
dictatorial behavior, counter-intuitive results), many fusion 
rules have been proposed in the literature as alternative to 
Dempster’s rule, see [30], Vol. 2 for a detailed list of fusion 
rules. We will not detail here all the possible combination rules 
but just mention that the Proportional Conflict Redistribution 
rule no. 6 (PCR6) proposed by Martin and Osswald in [30] 
(Vol. 3) is one of the most serious alternative rule for BBA 
combination available so far. 


Bel(A) 4 m(B), and PI(A)=1—Bel(A). (1) 


III. ATANASSOV’S INTER-CRITERIA ANALYSIS (ICRA) 


Atanassov’s Inter-Criteria Analysis (ICrA) approach is 
based on a M x N score matrix® S = [S;; = Cj(Ai),i = 
1,...,M,j = 1,...,N], and intuitionistic fuzzy pairs [12] 
including two membership functions j(-) and v(-). Mathe- 
matically, an intuitionistic fuzzy set (IFS) A is denoted by 
A = {(a,pa(x),va(x))|e@ € E}, where E is the set of 
possible values of x, u(x) € [0, 1] defines the membership of 


5where the symbol = means equal by definition. 
®called index matrix by Atanassov in [31]. 


x to the set A, and v4(x) € [0, 1] defines the non-membership 
of x to the set A, with the restriction 0 < wa(x)+va4(x) < 1. 
The ICrA method consists to build an N x N Inter-Criteria (IC) 
matrix from the score matrix S. The elements of the IC matrix 
consist of all intuitionistic fuzzy pairs (j1;;/,V;;") whose 
components express respectively the degree of agreement and 
the degree of disagreement between criteria C; and Cj, for 
j.J' € {1,2,...,N}. For a given column j (i.e. criterion C;), 
it is always possible to compare with >, < and = operators 
all the scores S;; for i = 1,2,...,M because the scores of 
each column are expressed in same unit. The construction of 
IC matrix can be used to search relations between the criteria 
because the method compares homogeneous data relatively to 
a same column. In [32] Atanassov prescribes to normalize the 
score matrix before applying ICrA as follows 


Se = (Si; = Sea = oe), (2) 


if one wants to apply it in the dual manner for the search of 
InterObjects analysis (IObA). 

Because we focus on ICrA only, we don’t need to apply a 
score matrix normalization because each column of the score 
matrix represents the values of a same criterion for different 
alternatives, and the criterion values are expressed with the 
same unit (e.g. m, m?, sec, Kg, or €, etc). 


A. Construction of Inter-Criteria matrix 


The construction of the N x N IC matrix, denoted’ K, is 
based on the pairwise comparisons between every two criteria 
along all evaluated alternatives. Let vat be the number of 
cases in which the inequalities S;; > Sj; and Sj; > Sy, 
hold simultaneously, and let A¥;, be the number of cases 
in which the inequalities S;; > Sy; and Sjj < Sjj- hold 
simultaneously. Because the total number of comparisons 
between the alternatives is M/ (JM — 1)/2 then one always has 
necessarily 


M(M — 1) 
bh rv 
Sy hg Se (3) 
or equivalently after the division by Me >0 
2K", 2K*., 
0<—_ + _4__ ee, (4) 
M(M-1) M(M-1) 


This inequality permits to define the elements of N x N IC 
matrix K = [K,;/] as intuitionistic fuzzy (IF) pairs Kj; = 
(H4j9”,¥jj") where 


2K", 2K”, 

A JI JI 
=> —_ d {>_> 5 
Hi Tat) 4 a ap 


/4;;" Measures the degree of agreement between criteria C; 
and C;,, and v;;- measures their degree of disagreement. By 
construction the IC matrix K is always a symmetric matrix. 


7We use K because it corresponds to the first letter of word Kriterium, 
meaning criteria in German. The letter C’ is being already in use. 
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The computation of A’ aan and Ky, can be done explicitly 
thanks to Atanassov’s formulas [32] 


M-1 M 
Ky a > > [sgn(Si; — Sir )sgn(Sij- — Sirjr) 
i=l #/=i41 
+ sgn(Sirj — Siz )sgn(Sirjr — Sizr)], (6) 
and 
M-1 M 
Kj; => y S- [sgn(Si; — Sir) sgn( Sir = Si;") 
i=l #/=i41 


+ sgn(Sirj — Si; )sgn(Si5" = Siv5)]; (7) 
where the signum function sgn(.) used by Atanassov is 
defined as follows 

1, ifa>0, 


aoe i. if «<0. (8) 

Actually the values of Kr, and [’¥;, depend on the choice 

of sgn(x) function’. That is why in [21], [33], the authors 

propose different algorithms implemented under Java in an 

ICrA software yielding different K’ ve and Ay, values for 

making the analysis and to reduce the dimension (complexity) 
of the original MCDM problem. 


B. Inter-criteria analysis 

Once the Inter-Criteria matrix K = [K,,/] of intuitionistic 
fuzzy pairs is calculated one needs to analyze it to decide 
which criteria C; and Cj, are in strong agreement (or positive 
consonance) reflecting the correlation between C; and Cj, in 
strong disagreement (or negative consonance) reflecting non 
correlation between C; and C;, or in dissonance reflecting the 
uncertainty situation where nothing can be said about the non 
correlation or the correlation between C; and C;,. If one wants 
to identify the set of criteria Cj for j’ # j that are strongly 
correlated with C; then we can sort j1;; values is descending 
order to identify those in strong positive consonance with 
C;. In [25], [26], the authors propose a qualitative scale 
to refine the levels of consonance and dissonance and for 
helping the decision making procedure. A dual approach based 
on vj; values can be made to determine the set of criteria 
that are not correlated with C;. An other approach [10], 
[27] proposes to define two thresholds a, 3 € [0;1] for the 
positive and negative consonance respectively against which 
the components Ly5' and V5! of Ky) = (1055/5 V59") will be 
compared. The correlations between the criteria C; and Cj 
are called “positive consonance”, “negative consonance” or 
“dissonance” depending on their jj and vj; values with 
respect to chosen thresholds a and £3, see [22] for details. 
More precisely, C; and Cj are in 

e (a, 3) positive consonance (i.e. correlated): 

If pyj > a and vj5" < £. 
e (a, ) negative consonance (i.e. no correlated): 


Sfor instance if we use sgn(x) = 1 if « > 0 and sgn(x) = 0 if « < 0, 
we will obtain, in general, other Kye and Kyi values. 


If [yy < B and Vij >. 
e (a, ) dissonance (i.e. full uncertainty): Otherwise. 


At the beginning of ICrA development it was not very clear 
how these intuitionistic fuzzy (IF) pairs (j0;;-,v;;") had to 
be used and that is why Atanassova [34], [35] proposed to 
handle both components of the IF pair. For this, she interpreted 
pairs ((j;",V;;/) as points located in the elementary TFU 
triangle, where the point T of coordinate (1,0) represents the 
maximal positive consonance (i.e. the true consonance), the 
point F’ with coordinate (0, 1) represents the maximal negative 
consonance (i.e. the falsity), and the point U with coordinates 
(0,0) represents the maximal dissonance (i.e. the uncertainty). 
From this interpretation it becomes easy to identify the top of 
consonant IF pairs (j;;/,1;;’) that fall in bottom right corner 
of (FU) triangle limited by vertical line from x-axis x = a, 
and horizontal line from y-axis y = (3. The set of consonant 
IF pairs are then ranked according to their Euclidean distance 
dé, Gy with respect to T point of coordinate (1,0) defined by 


IEC, = U(1,0), (597 459")) = 1 = Hy)? +37 (9) 


In the MCDM context only the criteria that are negatively 
consonant (or uncorrelated) must be kept for solving MCDM 
and saving computational resources because they have no (or 
only very low) dependency with each other, so that each 
uncorrelated criterion provides useful information. The set 
of criteria that are positively consonant (if any), called the 
consonant set, indicates somehow a redundancy of information 
between the criteria belonging to it in term of decisional 
behavior. Therefore all these positively consonant criteria must 
be represented by only one representative criterion that will 
be kept in the MCDM analysis to simplify MCDM problem. 
Also all the criteria that are deemed strongly dissonant (if any) 
could be taken out of the original MCDM problem because 
they only introduce uncertainty in the decision-making. 


C. General comments on ICrA 


Although appealing at the first glance, the classical ICrA 
approach induces the following comments: 


1) The IF values pj; and vj; can be easily interpreted 
in the belief function framework. Indeed, the belief and 
plausibility of (positive) consonance between criteria C;; 
and Cj, can be directly linked to the values jj; and v5" 
by taking Bel,,/ (8) = L545" and P55 (8) =1- V5 5!. 
Moreover U;,;"(8) = PAjy) (0) ms Bel,,/ (8) =1- V551 — 
/t;;’ Yepresents the dissonance (the uncertainty about 
the correlation) of the criteria C; and Cj. Here the 
proposition 6 means: the criteria C; and C;, are totally 
positively consonant (i.e. totally correlated) and the 
frame of discernment is defined as © = {0,0}, where 
@ means: the criteria C; and C;, are totally negatively 
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consonant (uncorrelated). From this, one can define any 


55" (8) = p35", (10) 
m3; (8) = Ui (11) 
m3 (88) = 1 pyy — vq. (12) 


2) The construction of pj; and v;; proposed in the 
classical ICrA is disputable because it is only based on 
counting the valid” >” or” <” inequalities but it doesn’t 
exploit how bigger and how smaller the scores values 
are in each comparison done in the construction of the 
Inter-Criteria Matrix K. Therefore the construction of 
fiz; and vj: is actually only a very crude method to 
estimate IF pairs. 

3) The construction of the Inter-Criteria Matrix K is in fact 
not unique as reported in [33]. This will yield different 
results in general. 

4) The exploitation of the ICrA method depends on the 
choice of a and £ thresholds that will impact the final 
result. 

5) The classical ICrA method cannot deal directly with 
imprecise or missing score values. 


IV. A NEW ICRA METHOD BASED ON BELIEF FUNCTIONS 


In this paper we propose a new ICrA method, called BF- 
ICrA for short, based on belief functions that circumvents most 
of the aforementioned drawbacks of classical ICrA. Here we 
show how to get more precisely the Inter-Criteria Belief Matrix 
and how to exploit it for MCDM simplification. 


A. Construction of BBA matrix from the score matrix 


From any non-zero score matrix S = [S;,;], we can construct 
the Mx N BBA matrix M = [m,;(-)| as follows 


mij(Ai) = Beli;(Ai), (13) 

mi; (Ai) => Bel;;(Ai) = 1 = Pl;;(Ai), (14) 

Mig (A; U Aj) = Pili; (A;) _ Bel;;(Aj). (15) 
Assuming A/,,, #0 and A/,,, #0, we take? 

Belij(Ai) = Supj(Ai)/Atnaxs (16) 


where AJ, max; Sup;(A;) and A?,,, & min; Inf;(Ai) 
and with 
Sup;(Ai) = » [Siz — Sigil; (18) 
ke {1,...M}|Skj<Si; 
Inf;(Ai) 4 _ S- [Si = Syl. (19) 


kE{1,...M}|Sp5>Si5 


The entire justification of these formulas can be found in our 
previous works [3]. For example, consider the j-th column 
corresponding to a criterion Cj of a score matrix S = [5;,] 
with seven rows given by s; = [10, 20, —5,0, 100, —11, 0]%, 


where T’ denotes the transpose. Then based on above formula 
we get the BBA values listed in Table I. 

For another criterion Cj, and the j’-th column of the 
score matrix we will obtain another set of BBA values 
mj;'(-). Applying this method for each column of the 
score matrix we are able to compute the BBA matrix 
M = [mj;(-)| whose each component is in fact a triplet 
(mi;(Ai), mij (Ai), Maj (A; U A;)) of BBA values in (0, 1] 
such that Miz (A;) + Maj (A;) + mij (Aj U Aj)) =1 for all 
i=1,...,M andj =1,...,N. 


B. Construction of Inter-Criteria Matrix from BBA matrix 


The next step of BF-ICrA approach is the construction of 
the N x N Inter-Criteria Matrix K = [K,,] from M x N 
BBA matrix M = [m,;(-)] where elements K,,, corresponds 
to the BBA (mj, (0), m55 (8), m35" (0 U 0)) about positive 
consonance @, negative consonance @ and uncertainty between 
criteria C; and Cj respectively. The principle of construction 
of the triplet Ky) => (m5 (8), M454! (0), M354" (0 U 0)) is based 
on two steps that will be detailed in the sequel: 

e Step 1: For each alternative A;, we first compute the BBA 
(mi, (8), m5,;,(8),m.,,(@U8)) for any two criteria j, j’ € 
41,25 wes dV hs 

e Step 2: The BBA (m5 (0), M547 (0), TM 5 4! (0 U 0)) is then 
obtained by the combinations of the /Z BBA mi, yr (-)- 

Construction of BBA mi; (.) 


The mass of belief mn (0) represents the degree of agree- 
ment between the BBA m,;(-) and m,,/(-) for the alternative 
A;, and myn (0) represents the degree of disagreement be- 
tween mj;(-) and mjj(-). The mass m‘,(0U@) is the degree 
of uncertainty about the agreement (or disagreement) between 
mj;(-) and mj,/(-) for the alternative A;. The calculation of 
m',,,(8) could be envisaged in several manners. 

The first manner would consist to consider the degree of 
conflict [28] kj, — dUxyce|xny=0 maj (X Jagr (Y) and 
consider the Bayesian BBA mij, (0) =1-— Kj jrs M541 (0) = 
ki and mj;;(8U@) = 0. Instead of using Shafer’s conflict, 
the second manner would consist to use a normalized distance 
dy = d(mj;,mj7") to measure the closeness between m,;(-) 
and mj," (-), and then consider the Bayesian BBA modeling de- 
fined by mj, (0) = 1—d),, mjj(0) = dj, and mj; (8U8) = 
0. These two manners however are not very satisfying because 
they always set to zero the degree of uncertainty between the 

If Advax = 0 then Belj;(A;) = 0, and if Al 


min 


= 0 then Pl; (Ai) a 


Table I 
BBAS CONSTRUCTED FROM SCORE VALUES. 
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agreement and disagreement of the BBA, and the second man- 
ner depends also on the choice of the distance metric. So, we 
propose a more appealing third manner of the BBA modeling 
of m'.,,(0), m’,,(0), and mi,,(0 U@). For this, we consider 
two sources of evidences (S oF) indexed by j and 9’ providing 
the BBA m4; and mj; defined on the simple FoD {A;, Aj} 
and denoted Mig = [mi;(Ai), mij (Ai), May (A; U A;)] and 
Miz = [Miz (Aj), Miz! (Aj), Mi5! (A; U A;)). We also denote 
© = {6,6} the FoD about the relative state of the two SoE, 
where 6 means that the two SoE agree, 9 means that they 
disagree and 0U 6 means that we don’t know. Then the BBA 
modeling is based on the important remarks 


e Two SoE are in total agreement if both commit their 
maximum belief mass to the element A; or to element 
A;. So they perfectly agree if m,;(Ai;) = mij(A;) = 1, 
or if m;j(A;) = mij(A;) = 1. Therefore the pure degree 


of agreement’? between two sources is modeled by 


mgr (8) = mig (Ai)may (Aa) + maj (A)miy (A). (20) 

e Two SoE are in total disagreement if each one commits 
its maximum mass of belief to one element and the other 
to its opposite, that is if one has m,;(A;) = 1 and 
miz(Ai) = 1, or if miy(Ai) = 1 and mjyj(Ai) = 1. 
Hence the pure degree of disagreement’! between two 
sources is modeled by 


= mig (Ai)mig: (As) + mig (Aa)miz (Aa). (21) 


m',(8) 


e All possible remaining products between components of 
mij and mj; reflect the part of uncertainty we have about 
the SoE (i.e. we don’t know if they agree or disagree). 
Hence the degree of uncertainty between the two sources 
is modeled by 


mj (8U8) = mig (Ai) mig (Ai 
+ Miz (A; U Ai)miy" (Ai) + 
+ mi; (Ai 


iUAi)-+mig(Ai)miy (AiVAi) 
mij (Ai U iu j*(Ad) 
U UA;). (22) 


By construction m‘,,(-) = mi. 5(-), hence this BBA modeling 
permits to build a set of MZ symmetrical Inter-Criteria Belief 
Matrices (ICBM) K* = |K%,,] of dimension N x N relative 
to each alternative A; whose component Kj : jr a to 
the triplet of BBA values mj, = (mj.(0), mj 4 (8),m 4g (OU 
#)) modeling the belief of agreement and of diccurees 
ment pelWece C; _ Cj based on A;. One has also” 
mi: (0), mii. (8),m i, (9UB) € [0, 1] and mi, (0)+m‘,,, (8) + 
M41 (U8) = 1. This BBA construction can se easily extended 


°or positive consonance according Atanassov’s terminology. 
lor negative consonance according Atanassov’s terminology. 

because (mij (Ai) +mij (Ai)+mij (AsUAi)) (mj; (Ai) +5" (Ai) + 
tug tde AQ) S121, 


ll 


for modeling the agreement, disagreement and uncertainty of 
n > 2 criteria Cj,,...,Cj,, altogether if needed by taking 


i 
M54 -[ I mij, (A “T EE Mii, (A 


in (8) ~ > II Mijn (Xj) 
Xj, re x jn €{Ai, Ai} k=1 
Xie ie =0 


mi, ;,(0U0) =1—mi,_, (0)— mi, 5 (8). 
Construction of BBA mj; (.) 


Once all the BBAs mi,,(.) (i = 1,...,M) are calcu- 
lated one combines them to get the component Kj; = 
(my5"(0), 55” (8), m5;"(8 U @)) of the Inter-Criteria Belief 
matrix (ICBM) K = [K;,-]. This fusion step can be done 
in many ways depending a the combination rule chosen by 
the user. If the number of alternatives / is not too large we 
recommend to combine the BBAs m'(.) with PCR6 fusion 
rule [30] (Vol. 3) because of known deficiencies of Dempster’s 
tule. If M is too large to prevent PCR6 working on computer, 
we can just use the simple averaging rule of combination in 
these high dimensional MCDM problems. 


V. SIMPLIFICATION OF ORIGINAL MCDM 


Once the global Inter-Criteria Belief Matrix K = [Kj;, = 
(1m; (8), m7” (0), m5;-(8 U @))] is calculated, we need to 
identify and cluster the criteria that are in strong agreement, in 
strong disagreement, and those on which we are uncertain. For 
identifying the criteria that are in very strong agreement, we 
evaluate the distance of each component of /’;;, with the BBA 
representing the best agreement state and characterized by the 
specific BBA!? my(@) = 1. From a similar approach we can 
also identify, if we want, the criteria that are in very strong 
disagreement using the distance of m,,-(-) with respect to the 
BBA representing the best disagreement state characterized by 
the specific BBA mpr(@) = 1. As alternative of Jousselme’s 
distance [37], we use the dgy(.,.) distance based on belief 
interval [36] because it is a good method for measuring the 
distance d(m1, m2) between the two BBAs" m (-) and m4(-) 
over the same FoD. It is defined by 


dp1(mi,m2) = Ne: > di, (BI (X), Bla(X)), (23) 
\ XE29° 


where the Belief-Intervals are defined by Bl(X) + 
[Bel (X), Ply(X)] and Blg(X) = [Belo(X), Ply(X)] and 
computed from mj (.) and mg2(.) thanks to formula (1). 


13We use the index T' in the notation m-(-) to refer that the agreement is 
true, and F in mp(-) to specify that the agreement is false. 


“Here mi(-) = mjj/(.), and m2(-) = mz(-) or ma(-) = mp(-) 
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dw(BI,(X), BIz(X)) is Wassertein’s distance between in- 
tervals calculated by 


dw ([a1, 61], [a2, b2]) = 


aitbi  agt+be : 
2 2 


1 
3 


bz — ag 
2 2 j 


* =(871 


and N, = i2e\ is a factor to get dgr(m1, m2) € [0, 1]. 


Because all criteria that are in strong agreement somehow 
contain redundant (correlated) information and behave simi- 
larly from decision-making standpoint, we propose to simplify 
the original MCDM problem by keeping in the MCDM only 
criteria that are non redundant The remaining criteria can be 
eventually weighted by their degree of importance reflecting 
the number of different criteria that are in agreement through 
this BF-ICrA approach. 


For instance, if one has a seven criteria MCDM problem 
and if criteria Ci, Co and C3 are in strong agreement we 
will only select one remaining criterion among {C), C2, C3} 
and we give it a weight of w; + w2 + w3. Moreover if Cy 
and C’ are in strong agreement also we will only select one 
remaining criterion among {C, C5} and we give it a weight 
of w4 + ws, and we will use the weight we for Ce, and w7 
for C7. Hence the original MCDM problem will reduce to 
a four simplified MCDM problem that can be solved using 
BF-TOPSIS method already presented in details in [3] and in 
[8], or with AHP [4] if one prefers, or with any other chosen 
method that the system-designer may prefer. 

The strategy for selecting the most representative criterion 
among a set of redundant criteria is not unique and depends 
mainly on the cost necessary (i.e. human efforts, data mining, 
computational resources, etc) for getting the values of the 
score matrix of the problem under concern. The least costly 
criteria may be a good option of selection. In the next section 
we provide simple examples for BF-ICrA and, for simplicity, 
we will select the representative criterion as being the one 
with smallest index. So in the aforementioned example the 
simplified MCDM problem will reduce to a M x 4 MCDM 
problem involving only four criteria Ci, C4, Ce and C7. 

The BF-ICrA method proposed in this work allows also, in 
principle, to make a refined analysis (if necessary) based on 
IC matrices Kj) about the origin of disagreement between 
criteria with respect to each alternative A; in order to identify 
the potential inconsistencies in original MCDM problem. This 
aspect is not developed in this paper and has been left for 
future investigations. It is worth mentioning that the analysis 
of the number of redundant criteria versus time improvements 
that could be proposed as an effective measure of performance 
of this approach depends highly of the application under 
consideration and the difficulty (and cost) to get the value 
of each criteria. For convenience the Figure 1 shows the 
flow chart of BF-ICrA to help the reader to have a better 
understanding of this new proposed method. 


score matrix S = [Sj,] 


Compute BBA matrix M = [m,;(-)] 
Eqs. (13),(14),(15) 


Compute local ICB matrices 
K* = [Kj 5.) = [(m} (8), m} (8), m}, (8 8))] 
Eqs. (20),(21),(22) 


Compute global ICB matrix 
K jj = [(7557 (8), mjz-(8), m57°(8 U4) 


by PCR6 fusion of m.,(- Pyeng mM, (- ++) 


Criteria clustering 
Identify criteria in strong agreement 


using dg; distance (23) 


MCDM simplification 
Select one criteria in each cluster 
and adjust its weight 


Simplified MCDM problem 
solved by any preferred method 
BF-TOPSIS, AHP, etc 


Figure 1. Flow chart of BF-ICrA method. 


VI. EXAMPLES 


A. Example I (Comparison of K matrices) 


Here we compare the construction of the global IC matrix 
K based on Atanassov ICrA and our new BF-ICrA approach. 
For this, we use the 5 x 4 MCDM example given in [33] based 
on the following score matrix (called sample data matrix in 
[33]). Each row of S corresponds to an alternative, and each 
column to a criterion. In [33], the authors use rows for criteria 
and columns for alternatives so they work with s’. 


S = [Sij] = 


ANwaad 
wronn 
PP oow sz 
oONoawes 


Based on Atanassov’s ICrA method (using unbiased algo- 
rithm presented in details in [33]) we will get the following 
4 x 4 global Inter-Criteria K” and K” matrices 


09 O 0.5 0.5 
0 09 0.5 0.3 
Be 
K = [B55] 0.5 05 1 0.5} ’ 


05 0.3 05 1 
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0 O08 O04 0.4 


Regrouping these two matrices into one matrix K = [K;,’] 
_— (KH iu 

with components Ajj = (Ajj, Kf,1 — Ki, — KF), one 

gets the following global Inter-Criteria matrix K 


(0.9,0,0.1) (0,0.8,0.2) (0.5,0.4,0.1) (0.5, 0.4, 0.1) 

K — | (0,0:8,0.2) (0.9,0,0.1) (0.5, 0.4,0.1) (0.3, 0.6, 0.1) 
= | (0.5,0.4,0.1) (0.5,0.4,0.1) (1, 0,0) (0.5, 0.5, 0) 
(0.5,0.4,0.1) (0.3,0.6,0.1) (0.5, 0.5, 0) (1, 0, 0) 


According to this K matrix it appears intuitively that none of 
the criterion is in strong agreement with others. We observe 
that criteria C; and C2 are in relatively strong disagreement 
because KY, = K%5, = 0.8 which is quite close to one. 
Criteria C2 and C4 are in relatively medium disagreement 
because KS, = KY, = 0.6. In this example no MCDM 
simplification is prescribed based on Atanassov’s ICrA. To 
get a more precise evaluation of degree of agreement between 
criteria based on Atanassov’s ICrA we apply formula (23) to 
get the D%,, distance matrix from each component of K to the 
total agreement state mr = [m(0), m(@), m(@U8)] = [1, 0, 0}. 
Hence we get 


0.0577 0.9018 0.4509 0.4509 

D2. = 0.9018 0.0577 0.4509 0.6506 

BI | 0.4509 0.4509 0 0.5000 
0.4509 0.6506 0.5000 0 


As we see from this D%,, matrix, the distance of the inter- 
criteria BBA for C; and C2 with respect to the total agreement 
state mr(6) = 1 is very large (ie. 0.9018) which means that 
C and Cy strongly disagree in this example as we expect 
from a more intuitive reasoning based on KY, = 0.8 value. 
Similar analyses can be done for all (non diagonal) elements 
of D%,, to identify which criteria are in strong agreement, or 
not (if any). 


Based on our new BF-ICrA method we first compute the 
5 x 4 BBA matrix M = [m,,(-)] from the score matrix S 
based on formulas (13)-(15). We get (all the values of results 
have been rounded at their second digit) 


(0.5, 0.08, 0.42) (0.71, 0. 


05,0.24) (0.18, 0.35, 0.47) (0, 1, 0) 
(0.25, 0.33, 0.42) (0.71, 0.05, 0 
0, 
i, 


.24) (0.09, 0.53, 0.38) (0.1, 0.6, 0.3) 


Mw (0, 1, 0) (1, 0, 0) (0.30, 0.24, 0.46) (0.3, 0.3, 0.4) 
(1, 0, 0) (0, (1, 0, 0) (0.6, 0.1, 0.3) 
(0.5, 0.09,0.41) (0.14, 0.62, 0.24) (0, 1, 0) (1, 0, 0) 


The construction of Inter-Criteria Matrices K* = [K; j| (for 


i =1,...,5) from the BBA matrix M based on formulas (20)- 


(22) yields the following five matrices 


(0.26, 0.08, 0.66) (0.36, 0.08, 0.56) (0.12, 0.19,0.69) (0.08, 0.5, 0.42) 
kK! ~ | (0.36, 0.08, 0.56) (0.51, 0.07, 0.42) (0.14, 0.26,0.6) (0.05, 0.71, 0.24) 
™ | (0.12,0.19,0.69) (0.14,0.26,0.6) (0.15, 0.13,0.72) (0.35, 0.18, 0.47) 
(0.08, 0.5,0.42) (0.05, 0.71, 0.24) (0.35, 0.18, 0.47) (1, 0, 0) 
(0.17, 0.17, 0.66) (0.19, 0.25, 0.56) (0.20, 0.16,0.64) (0.23, 0.18, 0.59) 
K2 ~ | (0.19, 0.25,0.56) (0.51, 0.07, 0.42) (0.09, 0.38,0.53) (0.10, 0.43, 0.47) 
™ | (0.20, 0.16,0.64) (0.09, 0.38, 0.53) (0.29, 0.09,0.62) (0.33, 0.10, 0.57) 
(0.23, 0.18, 0.59) (0.10, 0.43,0.47) (0.33, 0.10,0.57) (0.37, 0.12, 0.51) 
(1, 0, 0) (0, 1, 0) (0.24, 0.3, 0.46) (0.3, 0.3, 0.4) 
K3 aa (0, 1, 0) (1, 0, 0) (0.30, 0.24,0.46) (0.3, 0.3, 0.4) 
™ | (0.24, 0.3,0.46) (0.30, 0.24,0.46) (0.15,0.14,0.71) (0.16, 0.16, 0.68) 
(0.3, 0.3, 0.4) (0.3, 0.3,0.4) (0.16, 0.16, 0.68) (0.18, 0.18, 0.64) 
(1, 0, 0) (0, 1, 0) (1, 0, 0) (0.6, 0.1, 0.3) 
KA me (0, 1, 0) (4, 0, 0) (0, 1, 0) (0.1, 0.6, 0.3) 
~~ (1, 0, 0) (0, 1, 0) (1, 0, 0) (0.6, 0.1, 0.3) 
(0.6, 0.1,0.3) (0.1,0.6,0.3) (0.6, 0.1,0.3) (0.370.12, 0.51) 
(0.26, 0.08, 0.66) (0.12, 0.32,0.56) (0.08,0.5,0.42) (0.5, 0.08, 0.42) 
Ke ~ | (0.12, 0.32,0.56) (0.40, 0.18,0.42) (0.62, 0.14,0.24) (0.14, 0.62, 0.24) 
™ | (0.08, 0.5,0.42) (0.62, 0.14, 0.24) (1, 0, 0) (0, 1, 0) 
(0.5, 0.08,0.42) (0.14, 0.62, 0.24) (0, 1, 0) (1, 0, 0) 


The componentwise PCR6 fusion of all five K’ matrices 
provides the following global Inter-Criteria matrix Kpcre 


(0.90, 0.02, 0.08) (0.06, 0.83, 0.11) (0.55, 0.15, 0.30) (0.49, 0.25, 0.26) 

K ~ | (0.06, 0.83, 0.11) (0.95, 0.01, 0.04) (0.18, 0.58, 0.24) (0.08, 0.80, 0.12) 
PCR6 ~ | (0.55, 0.15, 0.30) (0.18, 0.58, 0.24) (0.89, 0.02, 0.09) (0.22, 0.48, 0.30) 
(0.49, 0.25, 0.26) (0.08, 0.80, 0.12) (0.22, 0.48, 0.30) (0.90, 0.02, 0.08) 


Applying formula (23) we get the following D%,, distance 
matrix from each component of Koces to the total agreement 
state mp = [m(@), m(A), m(A U 8)] = [1, 0, 0} 


0.0601 0.8845 0.3124 0.3909 
0.8845 0.0327 0.7037 0.8618 
0.3124 0.7037 0.0668 0.6355 
0.3909 0.8618 0.6355 0.0622 


We see that K and K,,,, are different specially Ko; = 
K32 = (0.5,0.4,0.1) with respect to Aes = WRS* 
(0.18, 0.58, 0.24). Based on D%,, matrix (24) it is obvious that 
no criteria strongly agree in this example so that no judicious 
MCDM simplification is recommended according to BF-ICrA. 


B. Example 2 (MCDM simplification) 


Here we consider a more interesting example showing how 
an MCDM simplification is possible. We consider a 6 x 5 
MCDM problem with the following score matrix. 


DE; = (24) 


7.5914 18.1828 18.3221 95.6739 4.5674 

8.7753 20.5506 20.8240 48.0229 —0.1977 

ee [Si] = —1.3492 0.3017 0.7804 79.8283 2.9828 
‘J 8.8739 20.7478 21.2302 13.3305  —3.6669 

5.2207 13.4413 13.5201 41.5979 —0.8402 

—1.7320 —0.4639 0.0213 91.4893 4.1489 


It is not very obvious to identify the closeness of these crite- 
ria (if any) to know if there is some underlying relationship be- 
tween them. For the analysis, we apply the BF-ICrA approach 
proposed in this work. After applying all derivations (similarly 
to those presented in Example 1), we finally get the following 
be , distance matrix from each component of K,,,, to the 
total agreement state mr = [m(0), m(@), m(9U4)] = [1,0, 0] 


0.0239 0.0239 0.0250 0.7512 0.7512 
0.0239 0.0239 0.0250 0.7512 0.7512 
D%,, = |0.0250 0.0250 0.0262 0.7595 0.7595 
0.7512 0.7512 0.7595 0.0568 0.0568 


0.7512 0.7512 0.7595 0.0568 0.0568 
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From the analysis of upper off-diagonal components of D%,, 
(put in boldface for convenience) it is clear that criteria C1, C2 
and C3 are in almost total agreement because their distance 
is close to zero. Also we can observe from D4,, that criteria 
C4 and Cs are also very close. So the original 6x 5 MCDM 
problem in this example can be simplified into a 6 x 2 MCDM 
problem considering only the simplified score matrix involving 
only Ci and Cy because Cz and C3 behave similarly to C; 
for decision-making, and Cs behaves similarly to Cy. Then 
the simplified MCDM will have to be solved by any preferred 
technique. 

Does the BF-ICrA make sense in this example? The answer 
is positive because it suffices to remark that the columns of 
the score matrix are not totally independent because C2(A;) = 
2-C\(A;) + 8, C3(Ai) = C2(A;i) + € (€ being a small 
contamination noise), and C'5(A;) = 0.1 -C4(A;) — 5. Hence 
the decision based either on C1, C2 or C3 will be very close, as 
well as the decision based on Cy or Cs. Therefore the result 
of BF-ICrA makes sense and the expected simplification of 
MCDM is well obtained from BF-ICrA. If we apply AHP, 
which is nothing but the weighted arithmetic average and we 
use the normalized score matrix based on (2), or BF-TOPSIS 
methods to solve original MCDM (assuming equal importance 
of criteria), or if we apply simplified MCDM based on BF- 
ICrA, we will get same preference order Ay > Ag > Ag > 
As > Ag > Az. So, the best decision to make is A, in this 
example. 


VII. CONCLUSION 


In this paper we have proposed a new method called BF- 
ICrA to simplify (when it is possible) Multi-Criteria Decision- 
Making problems based on inter-criteria analysis and belief 
functions. This method is in the spirit of Atanassov’s method 
but proposes a better construction of Inter-Criteria Matrix that 
fully exploits all information of the score matrix, and the 
closeness measure of agreement between criteria based on 
belief interval distance. This BF-ICrA approach for simpli- 
fying MCDM could deal also with imprecise or missing score 
values using the technique presented in [8]. An application 
of BF-ICrA for GPS surveying problem is presented in [38], 
and applications of BF-ICrA for simplifying and solving 
real MCDM problems for the prevention of natural risks in 
mountains will be the object of forthcoming investigations. 
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Abstract—The decision-making trial and evaluation laboratory 
(DEMATEL) method employs expert assessments expressed by 
crisp values to construct a group initial direct-relation (IDR) 
matrix. However, it tends to be a low-precision expression, 
especially in complex practical problems. Although significant 
efforts have been made to improve the DEMATEL method, 
these improvements tend to neglect individual characteristics 
and group consensus, resulting in unconvincing decision results. 
This study provides a Dezert-Smarandache theory (DSmT)-based 
group DEMATEL method with reaching consensus. In order to 
reasonably determine the group IDR matrix, basic belief assign- 
ment (BBA) function is employed to extract expert assessments 
and the proportional conflict redistribution rule no.5 (PCR5) of 
DSmT is employed to make fusion to derive the temporary group 
IDR matrix. Moreover, the consensus measures at both expert 
level and pair-factors level are calculated to determine whether 
the acceptable consensus level has been reached or not. If the 
required consensus level is not reached, a feedback mechanism 
will be activated to help experts reach a consensus. A consensus 
group IDR matrix for the group DEMATEL can be obtained with 
the help of feedback mechanism, based on which an algorithm is 
summarized for the proposed method to identify major factors in 
a complex system. Finally, numerical comparison and discussion 
are introduced to verify the effectiveness and applicability of the 
proposed method and algorithm. 


Keywords: DEMATEL, group decision making; Dez- 
ertaSmarandache theory (DSmT), consensus reaching, evi- 
dence distance, expert weight. 


I. INTRODUCTION 


Between 1972 and 1976, the Science and Human Affairs 
Program of Battelle Memorial Institute of Geneva developed 
the decision-making trial and evaluation laboratory (DEMA- 
TEL) method. This method aimed to describe the basic con- 
cept of contextual relations and identify causedeffect chain 
factors for a complex decision problem in an understandable 
manner by addressing the influence relations among factors 
given by experts [1], [2]. It was considered to be a credible 
decision-making method. 

The DEMATEL method has been extensively used to solve 
complex decision problems because of its simplicity and 
effectiveness, including problems pertaining to hospital service 
quality [3], decision making [4], sustainable supply chain 
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management [5], [6], etc. In special, in order to determine 
the weights of factors by considering their relations, the 
DEMATEL method is also extended into decision making 
fields, such as analytic hierarchy process (AHP) [7], [8], 
analytic network process (ANP) [9]-[14], and technique for 
order preference by similarity to ideal solution (TOPSIS) 
[15], [16]. In the DEMATEL method, the initial decision 
information is always subjectively given by experts in the form 
of crisp values and calculated to obtain an individual or group 
initial direct-relation (IDR) matrix by simple operations (e.g., 
weighted sum). However, such descriptions and operations are 
considered to hardly reflect the vagueness of the real world 
[17]. Therefore, scholars have carried out some research to 
improve the DEMATEL combined with fuzzy theory [18]. 
Several fuzzy DEMATEL methods have been introduced. For 
examples, Abdullah [19] introduced interval-valued intuition- 
istic fuzzy numbers to improve the judgement of DEMATEL 
in a group decision-making (GDM) environment. Addae [20] 
used a two-step fuzzy DEMATEL method to solve a practical 
problem. Asan [21] proposed a new interval-valued hesitant 
fuzzy approach to DEMATEL to explicitly deal with hesitation 
in expert assessments and offered a better representation of 
uncertainty, etc [22]—[25]. 

Obviously, all these extensions have made great contribu- 
tions to the DEMATEL method, and its ability of dealing 
with complex problems can be strengthened to some extent. 
As currently defined, instead of one expert, a panel gives the 
assessments on the influence relations of factors, and multiple 
experts arrive at an acceptable result [26]. This process makes 
it necessary to extend the traditional DEMATEL method to a 
group DEMATEL method, which belongs to GDM problems. 
Although some literature considers the DEMATEL from the 
perspective of GDM, we believe that these group DEMATEL 
methods have a lot of room for further improvement in both 
expert assessment extraction and group IDR matrix construc- 
tion. 

Firstly, the expert assessment extraction of group DEMA- 
TEL method should be improved to obtain accurate decision 
information in accordance with experts’ cognitive competence. 
As introduced earlier, DEMATEL is a GDM method totally 
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based on expert assessments to conduct later computation 
and analyzation. Considering the complexity of reality and 
individual characteristics of experts, there is no doubt that not 
every expert is proficient in all areas. In other words, experts 
may give incomplete or uncertain assessments to the influence 
relations among factors for a specific complex practical prob- 
lem. However, all of the existing group DEMATEL methods 
default that each expert can give a definite assessment for every 
pair of factors by means of crisp values or fuzzy numbers, and 
neglect the problem that the assessments given by experts may 
be incomplete in practice. If experts who cannot give definite 
assessments with crisp values or fuzzy numbers are required 
to give the assessments in those forms, the final decision of 
DEMATEL resulted from the ineffective assessments may be 
erroneous. Therefore, understanding how to depict and fuse 
incomplete information from experts is of great importance 
to improve the properties of the group DEMATEL method. 
Fortunately, the basic belief assignment (BBA) function, as a 
key concept in the DezertaSmarandache theory (DSmT) of ev- 
idence [27], can directly express the uncertainty by assigning 
probability to the subsets of a set composed of multiple objects 
rather than to each individual object [28]. The BBA functions 
generated from different evidence sources (experts) could be 
well fused by the proportional conflict redistribution rule no. 
5 (PCRS). All these features exactly meet the requirements 
of the group DEMATEL method. Accordingly, DSmT is used 
to extract and fuse expert assessments to derive the group 
IDR matrix in this paper. Moreover, the differences in experts’ 
knowledge backgrounds and cognitive abilities on a particular 
problem are reflected by expert weights [29]. Expert weights 
are reflected as discounting parameters to reflect one’s relative 
importance in a group during fusion process. In this paper, 
expert weights are calculated based on similarity functions of 
expert assessments. 

Secondly, the group IDR matrix construction of group DE- 
MATEL method should be improved to obtain the acceptable 
decision results in accordance with experts’ satisfaction. In 
the GDM, “group” refers to not only the number of experts 
merely, but also the experts who have common interests 
in reaching a consensus for the ultimate satisfactory results 
despite individual differences. This principle helps to reduce 
biased evaluations and inherent partiality in GDM processes 
[30]. Unfortunately, the group IDR matrix in the existing group 
DEMATEL methods is frequently constructed by making 
arithmetic average values for individual IDR matrices, while 
whether experts agree with the group results is scarcely con- 
cerned. The group IDR matrix plays a fundamental role in the 
entire DEMATEL processes and it has a significant influence 
on the effectiveness of final results. If strong inconsistency 
and conflict exist among experts, the group IDR matrix may 
not be able to precisely describe the real influence relations 
of factors. Therefore, it is particularly important to construct a 
group IDR matrix according to the consensus rules, that is, to 
construct a group consensus IDR matrix by reaching general 
or widespread agreement among the experts involved in the 
GDM processes [31]. Fortunately, Herrera-Viedma proposed 


a rational consensus model in GDM composed of a selection 
process and a consensus-reaching process (CRP), which had 
become a hot issue in the recent GDM area. The CRP has 
been successfully introduced to make GDM with different 
situations, such as hesitant fuzzy preference relations, Delphi 
processes, multi-attribute large-scale GDM, sentiment anal- 
ysis, and virtual reality industry [3]-[6], [32]. Traditionally, 
unanimous agreement of all experts in CRP is required. 
However, the desired result can hardly be achieved because 
of the diversity of opinions, knowledge, and experiences of 
experts. Therefore, the concept of “soft consensus” has been 
provided, in which, “soft” means better reflecting all possible 
levels of agreement by setting an acceptable consensus level 
(CL) threshold value (such as 0.8 rather than 1) and guiding 
the CRP until a high-level agreement is achieved among the 
individuals. Soft consensus can be reached through an iterative 
dynamic process with several collection and adjustment rounds 
[33], [34]. Hence, in this paper, we consider a soft CRP in 
the group IDR matrix construction, and transform the original 
static group DEMATEL problems into dynamic ones. 

The motivation of this paper is to improve group DEMATEL 
method according to the following three aspects. Firstly, the 
BBA function is used to extract expert assessments to accu- 
rately express uncertainty and incompleteness, thus reducing 
the loss of accuracy. Secondly, the initial assessments are 
discounted with expert weights by using Shafer’s discounting 
method. Moreover, the PCRS of DSmT is used to fuse 
the discounted assessments to overcome the defects in the 
intuitional paradox of Dempster’s combination rule. Thirdly, 
we apply a soft CRP to help reach an acceptable CL in the 
construction of group IDR matrix to ensure the consistency 
and satisfaction among experts. This paper is organized as 
follows. Section II briefly introduces the basic knowledge of 
DEMATEL and DSmT. In Section II, the DSmT-based group 
DEMATEL method with reaching consensus is proposed, and 
the corresponding algorithm is summarized. In Section IV, 
the numerical comparison and discussion are provided to 
demonstrate the performances of the proposed method and 
algorithm. In Section V, the conclusion is drawn and the future 
research directions are briefly discussed. 


II. PRELIMINARIES 


In order to facilitate the later formulation, some basic 
concepts of DEMATEL and the Dezert-Smarandache theory 
(DSmT) are given in this section. 


A. DEMATEL method 


The DEMATEL method is an effective way to analyze 
the influence relations among factors of a system. Through 
an analysis of the total influence relation of factors by the 
DEMATEL method, we can obtain a better understanding of 
structural relations and an ideal way to solve complicated sys- 
tem problems. Consider that a group of experts are invited to 
assess the influence relations for a set of factors F = {f1,, fr} 
with a set of grade levels {0,1,2}, where the expert set is 
E = {e1,...,eK} , and the degree of influence to which he 
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or she believes factor f; has an effect on factor f; (denoted by 
fi — f;) is assessed by expert ex, and denoted as gf, Vi, j,k. 
The meaning of each element in the set of grade levels is that 
0 denotes “no influence’, 1 denotes a “low influence” and 2 
denotes a “very high influence”. The central concepts in the 
DEMATEL are defined as follows. 


Definition 1 [35]. Suppose a pairwise comparison of influence 
degree from the i-th to the j-th factor given by the k- 
th expert e; is denoted as 9 with 0-2 grade levels, and 
the grade levels given by each expert form a L x L non- 
negative answer matrix G* = [of lext y ho Ty. kk, 
The group IDR matrix, which represents the initial direct 
relation between each pair of factors derived from experts, 
is obtained by calculating the average values of all experts’ 
answer matrices as G = [g,j]Lx1, where gj; = ee 95,/K, 
for i,j =1,...,L. 

In Definition 1, the 0, 1, 2 grade levels mean “no influence’, 
“medium influence”, and “high influence’, respectively. Note 
that, the diagonal elements of each answer matrix G* are all 
set to zero, which means that the factors do not influence 
themselves. 


Definition 2 [35]. The maximal row-wise and 

column-wise sum of matrix G is calculated by 
’ L L ; 

g' = max(maxi<i<p )75_1 Jij,MAX1<j<Lb Dj; Hig); then 


the normalized IDR matrix D = 
according to (1). 


[d;;]Lx1 can be computed 


D=G/g'. (1) 


Definition 3 [35]. Suppose the direct and indirect relations 
among several factors are represented by the total relation 
matrix, and it is defined as in (2). 


A= Jim (D+ DP + DY) = DI =D)". -Q) 
Some kinds of extensions are further discussed to strengthen 
the original DEMATEL. One kind of extensions is used to 
overcome the drawback that raising the normalized IDR 
matrix to the power of infinity may not converge to zero, and 
hence, the total relation matrix may not converge (see (2)). A 
very small positive number ju (e.g.,2 = 10~°) is introduced in 
the maximal row-wise and column-wise sum of matrix G as 
" L L 
g = max(maxi<i<r yoga Gig, fb + MAXy<j<b ei 9:3) 
Other steps remained unchanged as in the original DEMATEL. 
The revised DEMATEL guarantees that the normalized IDR 
matrix to infinite power will converge to zero and that the 
total relation matrix can be obtained smoothly [36]. 


Definition 4 [35]. Suppose r and c represent the sum of 
rows and the sum of columns of the total relation matrix A. 
According to A = [a;;|,x1, 7 and c can be defined as follows: 


r=(rilexa = (> ais) ) 
Lx1 


j=l 


E 
c= lelixt = (Sean), 


j=l 


(3) 


where r; shows the total influence, both direct and indirect, 
given by the factor f; to other factors; c; shows the total 
influence, both direct and indirect, received by the factor f; 
from other factors; r; + c; is defined as the prominence, 
showing the degree of the important role that the factor f; 
plays in the complex system; and r; — c; shows the net 
influence that the factor f; contributes to the complex system. 
Note that, if r; — c; is positive, the factor f; is a net causer; 
if r; — c; 1s negative, the factor f; is a net receiver. 


B. Dezert-Smarandache theory 


DSmf1, jointly proposed by Dezert and Smarandache [37], 
can be used to obtain more accurate fusion results of BBA 
functions especially in high conflicting information cases. It 
has a series of proportional conflict redistribution rules to make 
fusion for evidences [38], among which, PCRS is the most 
widely used one with the advantages in dealing with conflict 
belief functions. For example, it provides the appropriate 
redistribution of conflict beliefs and can produce a reasonable 
fusion result even in highly conflicting cases. These attractive 
features motivate the use of DSmT in GDM problems, such 
as map reconstruction of robot [39], decision making support 


[40], [41], target type tracking [42], image processing [43], 
data classification [44]-[48], clustering [47], [49], [50], and 
so on. 


In DSmT framework, the frame 0 = {6),...,0y} is a 
finite set of Y exhaustive propositions that are not necessarily 
mutually exclusive. The hyper-power set D® is defined as the 
set of all composite propositions built from elements of O with 
U and M operators, such that [51]: 

(i) 0,01,...,6y € D®; 
(ii) if Oy, Ay € D®, then Oy U Oy € D® and Oy MN Ay € D®; 
(iii) No other elements belong to D®, except those obtained 
by using rules (i) or (ii). 
Definition 5 [37]. Suppose 0 = {0),...,0y} is a set of 
exhaustive propositions; then the basic belief assignment is 
defined over the hyper-power set D®. If the mapping function 
m : D® — [0,1] could fulfill the following: 


m(0)=0, > m(9) =1, (4) 
dED° 
then m/(-) is called the BBA function. If m(@) > 0, 64 is 
called a focal element. 
In DSmT framework, PCR5 for making fusion for two 
pieces of evidence is introduced as follows. 


Definition 6 [37]. Suppose the BBA functions of two pieces 
of evidence are m1 and ms on D®; then, PCR5 to fuse m, 
and mz can be defined as follows for all 6 € D®: 


0,if 0 =90, 
de o'no" =o, 71 (4" m2") 
MPCR5 (0) = 6,8" CD° (0)? (0'”’) 
+2 gps FH") 
m2(0)?m1(0"") 
+ aoa) if 8 ar 
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III. THE PROPOSED METHOD 


In the proposed method, we construct the group IDR matrix 
in the DEMATEL by following two steps. Firstly, we develop 
an expert assessments extraction and fusion mechanism with 
DSmT to obtain a temporary group IDR matrix (see III-A). 
Secondly, we activate the soft CRP based on the current 
individual and group IDR matrices to reach a soft consensus 
among experts (see III-B). 


A. Expert Assessment Extraction and Fusion 


In traditional DEMATEL, experts are advised to give 
their assessments of the influence relations among factors by 
means of crisp values to construct the group IDR matrix 
G = |gij]Lx1 (see Definition 1), which is a rough extrac- 
tion method with low precision. When restricted by expert 
experiences and knowledge, the assessments given by experts 
may be ignorant and partially credible. Taking the influence 
degree for f; — f; and expert e, as an example, the expert 
may well know this problem and he or she can give a definite 
result by using one of the given grade levels {0, 1,2}. On the 
contrary, the expert may not know this problem at all, in which 
case he or she cannot give any assessment information about 
the influence degree. In most instances, the expert knows this 
problem to a certain extent and he or she may point out that 
the influence degree belongs to several grade levels but may 
not be sure which is the best one. Obviously, experts cannot 
give their assessments by means of crisp values in the latter 
two situations—that is, the original extraction method is not 
practical and does not consider experts’ personality. Thus, we 
employed the BBA function to extract expert assessments in 
this paper as shown in Fig. 1. 

As shown in Fig. 1, the expert assessment extraction and 
fusion mechanism includes three major steps. (1) Extracting 
expert assessments with BBA function; (2) Discounting these 
assessments with Shafer’s discounting method; (3) Fusing the 
discounted assessments with PCR5. Next, we will give a 
detailed definition and explanation of the procedures involved 
in the above processes. 

Let the frame of discernment be O = {61,602,062} = 
{0,1,2}, which can be seen as the discernment frame of 
DSmT. Expert e; is asked to assess the influence degree for 
f; — f; and his/her assessment is allowed to be expressed by 
the BBA function as given in Definition 5 and shown in (6). 


bf, i) {(0, bf, (8))| sy bf (0) =1; 


(Ten) 
k hens 


All of the BBA functions make up the individual IDR matrix 
for expert e,, denoted as BY = [bE Jn xz, with 7,7 =1,...,L 
and k = 1,...,K. Note that the basic beliefs in bf, can 
be assigned not only to singleton grade levels but also to 
any subsets of ©, thereby it is allowed such an assessment 
(also called a piece of evidence) to be profiled by a BBA 
defined on the hyper-power set D®. It is capable of reflecting 
ignorance in expert assessments, and the basic beliefs in bf, 


can be given to © (global ignorance) or to 6 C © (local 
ignorance) according to the unknown and partial assessments 
[52]. Expert assessments could be expressed precisely through 
BBA functions as discussed, laying a great foundation for later 
DEMATEL procedures. 


Example 1. Assume expert e; points out the influence degree 
for f; — f; has 30% probabilities belonging to 6; and 70% 
probabilities belonging to 02. Thus, his or her assessment is 
described as b;;(0) = {(01, 0.30), (02, 0.70)}. Expert e2 points 
out the influence degree has 20% probabilities belonging to 
6, and has 80% probabilities belonging to 02 U 63 but is not 
sure which is the best one. Therefore, his or her assessment 
is described as b7;(0) = {(01, 0.20), (2 U 43, 0.80)}. Expert 
€3 points out the influence degree for f; > f; has 100% 
probabilities belonging to 6263. Thus, his or her assessment 
is described as b3;() = {(02 7 63, 1.00)}. Expert e4 cannot 
give any information about the influence degree. Therefore, 
his or her assessment is described as bi (9) = {(0, 1.00)}. 


Normally, expert weight is subjective and relative to reflect 

the importance of one’s assessments in a group, and it is 
usually denoted by w in [0,1], with 0 and 1 respectively 
standing for not important at all and the most important [53], 
[54]. It is obvious to find that the expert weight can be 
determined by AHP [55], ANP [56] or Delphi [57], and it can 
also be determined subjectively according to the requirements 
of actual issues [58]. As discussed, expert weight is used to 
account for one’s relative importance among all experts—that 
is, the closer one’s assessments are to others’ assessments, 
the more important the expert is likely to be [40]. Hence, 
in our opinion, expert weight could be calculated indirectly 
based on the similarity between one expert and other experts— 
that is, the higher the similarity of assessments between the 
expert and others, the larger the weight of the expert. Expert 
weight is directly proportional to the similarity between expert 
assessments and othersd assessments. Thus, we use evidential 
distance to depict the similarity of expert assessments, based 
on which we could calculate expert weight in simple ways. 
The Euclidean evidential distance and Euclidean evidential 
similarity, which have little computation complexity and fast 
convergence speed, are defined as follows. 
Definition 7 [59]. Suppose m1 and mz are two BBA functions 
for the same frame of discernment 0, 9,, is the n-th element of 
D®, and |D®| is the cardinality of D°. The distance between 
my, and mz is defined as follows. 


\D°| 
S> [rmi(On) — m2(On)]. (7) 


n=1 


Dist g(m1, m2) Ji 
Definition 8 [59].Suppose m1 and m2 are two BBA functions 
for the same frame of discernment O, 9,, is the n-th element 
of D®, and |D®| is the cardinality of D©. The Euclidean 
similarity function Simp(m,,mz) is defined based on the 
Euclidean evidential distance as follows. 


|D°| 
S~ [m1 (On) — m2(9n)]?- (8) 


n=1 


Simp(m,,m2) = 1— 
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Figure 1. Expert assessments extraction and fusion mechanism. 


Example 2. Assume two pieces of evidence, b; = {(62,1)} 
and by = {(0),0.4), (O2,0.6)}, and that the distance be- 
tween them could be computed by (7): Distg(m1,m2) = 
av (0 0.4)? + (1 — 0.6)? = 0.4. The similarity between 
them is computed by (8): Simg(m1,m2) = 1-—0.4= 0.6. 

On the basis of the previous definitions, the similarity 
between experts can be directly calculated. Thus, we define 
the following expert weight calculation method. 


Definition 9 [60]. Suppose the individual IDR matrix consist- 
ing of all influence relations among factors given by expert e; 


is B® = [OE |x, with i, j =1,...,Landk=1,...,K. The 
similarity between experts e, and ex on each pair of factors 
fi fj is Sim n(bi,, be), and the similarity between any two 
experts is computed as s**’ = ee ij Sime (bi, bk) /L?, 
where L? denotes the quantity of factor pairs, making up the 


similarity matrix as follows. 


1 gik glk 
S= [ae lek = gk 1 gk (9) 
gK1 ghkk 1 


The support of e; can be obtained by adding all of the 
elements in the similarity matrix S that are related to expert 
ex except for self-similarity, i.e., Sup(ex,) = Saas gh’, 
k = 1,...,K where Sup(e,) represents the support degree 
of expert ex, received from other experts. By normalizing 
them, we could obtain the expert credibility Crd(e,) which 
is generally regarded as expert weight w* as follows. 


K 


w* = Crd(e,) = Sup(ex)/>— Sup(ex,), k=1,...,K. 
k=1 


Example 3. Assume experts €1, €2, e3 give their assessments 
for the influence relations among factor set F = {F\, Fo, F3}, 
respectively, as follows: 


{(01,1)} {(02, 1)} {(0s,1)} 

B' = | {(02,0.7), (63,0.3)} {(81,1)} {(61,0.4), (03,0.6)}| , 
{(0:,1)} {(0>, 1)} {(01,1)} 

; {(61,1)} {(O2, 1)} {(0s,1)} 

B? = | {(01,0.8), (62, 0.2)} {(61, 1)} {(01,1)}| , 
{(61,1)} {(01,0-5), (02,0-5)} {(01, 1)} 


LGD} £0201) 
B= {(02 U 63, 1)} a 


{(01,0.2), (02 U = 
{(A1, 1)} | 


{( 11 
{(1,1)} 


The similarity matrix can be calculated by Definition 9 as 
follows: 


1.00 0.47 0.29 
S= {0.47 1.00 0.41 
0.29 0.41 1.00 


We calculate the support degree of experts as Sup(e,) = 
0.47 + 0.29 = 0.76, Sup(e2) = 0.47 + 0.41 = 0.88, 
Sup(e3) = 0.29 + 0.41 = 0.70. Then we compute the expert 
weight as w! = Crd(e,) = 0.76/(0.76 + 0.88 + 0.70) = 
{(A1, Pi), (02, Pz), (03, P3)}, w? = 0,87/2.34 ~ 0.87, 
w® = 0.70/2.34 = 0.30. 

As mentioned above, expert weight should be considered 
through Shafer’s discounting method, which multiplies the 
masses of focal elements by the expert weight and transfers 
all of the remaining discounted mass to the full ignorance O. 
Mathematically, Shaferas discounting method can be given as 
in Definition 10. 
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Definition 10 [61]. Suppose the BBA function is b as in (4), 
and w is a parameter to discount the evidence, 0 < w < 1. 
Then, Shafer’s discounting method is defined as follows: 


0, 6=9, 
mO)=(w-b(),a@cO 649, (11) 
w-b(8)t+1—w, 6=8. 


If the sum of weights is equal to 1, then the discounting 
parameter is usually derived by standardizing the weights as 
w* = w*/max(w*|k = 1,...,K). The discounted expert 
assessments could be described as follows. 


A) >) mi (6 


6Co 
mk, (0) > 0,0 € O, Vi, j, k}. 


er = {Omar 


(12) 


The discounted individual IDR matrix is denoted as M* = 
[mi |bxz, with 7,7 =1,...,Landk=1,...,K. 


Example 4. Assume expert weight is W = {w, = 0.40, w2 = 
0.20,w3 = 0.20,w4 = 0.20}.The assessments given by 
four experts are the same as in Example 1. The discount- 
ing parameters for the four experts can be standardized 
as wt) = wyi/max(wi,...,wa) = 0.40/0.40 = 1.00, 
and similarly we get w? w wt 0.20/0.40 = 
0.50. Taking them into (11), the discounted assesstnenls are 
as follows: m;;(0) = {(1,0.30), (ala, 0: 70)}, m3,(0) = 
{(61, 0.10), (@2 U 3, 0.70), ies 0.50)}, m3 (0) = "(00 a 
3,0.50), (©, 0.50)}, and m4 (9) = {(9, 1. 00)}. 


The DSmT framework with PCRS5 rule can be used to make 
fusion for individual discounted assessments as in (12) and the 
fusion result can be described as follows: 


SS mi5(9) 


ce 
miz(0) > 0,0 © ©, Vi, 7}. 


mig = {(8, mi; (8 


(13) 


Example 5. Assume two pieces of evidence are: 
m};(0) = {(01,0.20), (2, 0.30), (©,0.50)}, and m7.(@) = 


{(62, 0.70), (0, 0.30)}. Taking them into (5), the fusion results 
are obtained as m;;(0) = {(91, 0.09), (@2, 0.76), (0, 0.15)}. 


B. The Soft CRP for Group DEMATEL 


Soft CRP can be designed with the aim of supporting 
experts until a group consensus is reached by following several 
discussion and adjustment rounds. As discussed in Section 
I, situations in which all of the experts agree with each 
other unanimously are rare or not desirable in the decision 
making process. Since the “unanimous consensus” of conflict 
tolerance, which has the ability to satisfy all pairs of BBA 
functions, rarely exists, the choice of a “soft consensus” is 
largely subjective and application oriented [62]. That is, setting 
an acceptable CL threshold value (suppose that the value is 0.8 
there) to guide the whole process to reach group consensus. 
Basically, this process relies on making assumptions about 
expertsa willingness to change their opinion or preferences 


[63], [64]. Initially, consensus measures are computed based 
on individual and group IDR matrices to determine whether an 
acceptable CL has been reached or not. If so (CL > 0.8), the 
soft CRP is finished and a consensus group IDR matrix can 
be obtained. Otherwise (CL < 0.8), the feedback mechanism 
will be activated, and the experts who are not contributing to 
the consensus are identified and the advice about how to alter 
their assessments is generated [65]. 

This consensus measure can indicate the current consensus 
situation throughout the soft CRP. According to the char- 
acteristics of group DEMATEL, the consensus measure is 
categorized into two levels, i.e., pair-factors level and expert 
level. The pair-factors level is the most basic level and reflects 
the original conflict degree. At this level, experts make some 
adjustments to increase consistency among the group. The 
expert level is the highest level, which reflects the conflict 
degree between a specific expert and the group as a whole 
for the collected results on all pairs of factors [66]. We 
employ these two levels of consensus measures to identify 
inconsistencies between experts and group. 

The calculation methods for consensus measures can be 
derived based on the similarities between these assessments. 
At the pair-factors level, the consensus measure values can 
be calculated based on the Euclidean similarity function as 
in Definition 11. At the expert level, the consensus measure 
values can be calculated by adding the consensus measure 
values on all of pair-factors for an expert as in Definition 
12. Obviously, the order for calculating the two levels of 
consensus measure is pair-factors level at first and then expert 
level. When determining the inconsistency assessments that 
need to be modified, the order is the highest level at first and 
then the most basic level. Experts only adjust the conflicting 
assessments based on the recommendations at the pair-factors 
level. 


Definition 11. In the ¢-th round, suppose the assessments 
of the pair of factors f; A given/calculated by expert e; 
and group are bist and mi;. Then, the consensus measure 
values can be calculated by the Euclidean similarity function 
as follows. 


|D°| 
aj =1- se Sy [08 Gn) —mtj(n)]) » (14) 
n=1 


Definition 12. In the ¢-th round, suppose the consensus 
ieee value of expert e, for f; — f; at pair-factors level is 
ae Vi , j,k, then the consensus measure value of expert e;, at 
expert level in this round can be calculated as follows. 
L 


ahem 3 


i=l iks 


oe A a (15) 
where L? denotes the quantity of factor pairs. 

The closer the value c*' is to 0, the greater the conflict 
degree; the closer the value is to 1, the smaller the conflict. 
Obviously, c’* = 0 indicates complete conflict between expert 
ex, and group, and c*! = 1 indicates no conflict between them. 
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If each expert’s consensus measure c’' is larger than the 


acceptable CL threshold value, indicating that existing conflict 
degree can be accepted by all experts and group consensus is 
reached, then the soft CRP is finished and the current collected 
group IDR matrix Mt = [mij|nxL is taken as a consensus 
result. Otherwise, if one’s consensus measure c*’ is lower 
than the acceptable CL threshold value, indicating that strong 
conflict degree among experts, then the feedback mechanism 
is carried out to help the inconsistent experts adjust their 
assessments to enhance group consensus, and a new round 
(t +1) of group IDR matrix construction should be initiated. 

Note that, expert weights may be not fixed during several 
of the rounds for the reason that expert assessments may 
be changed in the context of not reaching the acceptable 
CL. Thus, in each new round, we need to recalculate expert 
weights based on the latest assessments. In order to prevent the 
collective assessments from failing to converge after several 
discussion rounds, we incorporate a maximum number of 
rounds (ty;7 4x) in the soft CRP to develop. It can be ensured 
that the feedback mechanism will not be carried out when and 
the current collected group IDR matrix will be taken as the 
final result even if the acceptable CL has not been reached 
yet. 

Suppose the acceptable CL threshold values at expert level 
and pair-factors level are €, and ey. Then the processes of 
feedback mechanism can be divided into the following two 
steps. 


e Step 1: Experts whose consensus measure values at expert 
level are lower than the threshold value ¢, in the t-th 
round are identified as follows. 


E, = {k|c** < ec}. (16) 


e Step 2: For the identified experts in step 1, their assess- 
ments for such pairs of factors that the consensus measure 
values are lower than the threshold value are identified 
as follows. 


Fy = {k,i > jlo’ Se, Nk € Ej}. (17) 
When E, and F, have been identified, personal advices 
for experts will be generated to reduce the conflict caused 
by the moderator. Since pair-factors level is the most basic 
level and contains the original assessments of experts, only 
the experts whose consensus measure values c*! are lower 
than the threshold value €, may obtain the adjusted advices. 
After the identified experts all finish their adjustments, a new 
round of group IDR matrix construction will be carried out 
to determine the consensus situation. It is obvious to find that 
the above procedure is very similar to the Delphi method. 


C. DSmT-Based Group DEMATEL Method and Algorithm 


The construction of a group IDR matrix includes two main 
processes. The first is collection process, which focuses on 
expert assessment extraction and fusion; and the second is 
soft consensus reaching process, which aims at reaching an 
acceptable CL among experts. After the above two processes, 
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the final group IDR matrix MM can be constructed. Obviously, 
M is different from the original IDR matrix G as given in 
Definition 1 for two reasons. (1) M is a consensus group IDR 
matrix that satisfies all experts, while G does not consider con- 
sensus and satisfaction of experts. (2) M/Z is made up of BBA 
functions that is capable of reflecting local or global ignorance 
in experts’ mind, while G' is made up of crisp values. Note that, 
the IDR matrix included in the DEMATEL is required to be 
exact influence degrees rather than BBA functions. Therefore, 
we apply the generalized pignistic probability as in Definition 
13, which is an extension of the original pignistic probability 
[67], to reassign the local and global ignorance in the BBA 
functions to singleton grade levels. Then, we calculate the 
expected value to derive the exact influence degrees. 


Definition 13 [68]. Suppose the frame of discernment in the 
DSmT framework is 0 = {0),...,0y}, then the generalized 
pignistic probability for all A € D® can be calculated by (18). 


Xn Al 
a 


P(A) = 
XED® |x| 


m(X). (18) 
where |X| denotes the cardinal of proposition X. 

Taking ™,; in the final group IDR matrix M into (18), 
the transformation results can be obtained and recorded as 
{(01, Pi;),-.., (Oy, P3;)}, where Py, denotes the probability 
of influence degree 0,, y = 1,..., Y. The subscript Y is equal 
to 3 for the reason that the frame of discernment is defined 
as O = {61, 02,03} = {0, 1,2} in this paper. Which influence 
grade level or influence degree is attached to the relationship 
of f; + f;? We follow two kinds of principles to solve this 
problem: (1) Highest probability principle. The grade level 
with the highest probability g;; = 0, is chosen as the final 
influence grade level, where {0,|P*, = max(Pj5,..., Px)}; 
(2) Expected value principle. The expected value gj; = 
Se Oy x PY Vi, j is calculated as the final influence degree. 
According to the real situation, we obtain the final influence 
degree for f; — f; by using one of the two principles, and then 
the final group IDR matrix G= [Gi;]Lx can be constructed. 

The process of DSmT-based group DEMATEL is illustrated 
as in Fig. 2. As shown in Fig.2, the complete steps of DSmT- 
based group DEMATEL can be summarized as follows. 

e Step 1: Define group DEMATEL problem and _ soft 
CRP parameters. Suppose the set of factors is F = 
{fi,..., fr}, the set of experts is FE = {e),...,ex}, 
the set of grade levels is © = {0, 1,2}, and the threshold 
value to filter out major factors is 7. Then, the acceptable 
CL threshold values at expert level and pair-factors level 
are €- and ¢€f, the round counter is ¢ (¢ is set to one at 
first), and the maximum number of rounds is tyyax. A 
moderator is invited to participate in the group decision 
making process and responsible for managing the whole 
process such as consensus measure value calculation, 
inconsistent expert identification, the maximum number 
of round judgement. 


e Step 2: Expert assessments extraction and fusion. Experts 
are advised to indicate the influence degree to which he or 
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Figure 2. The process of proposed method. 


she believes factor f; has an effect on factor f; (denoted 
by f; — f;) under the framework of DSmT. The BBA 
functions assessed for each pair of factors f; > f; by 
expert e, are described as bf, (as in (6)), and make up an 
individual IDR matrix B* = [bi,]_x1. On the basis of 
expert assessments, expert weights W = [w!,...,w*] 
are calculated by Definition 8. Initial assessments bf. are 
discounted by Shafer’s discounting method with expert 
weight as in Definition 10 to obtain the discounted 
assessments mk. Then PCRS is applied to fuse those 
BBA functions to obtain group assessments m,;, making 
up group IDR matrix M = [mj;|7 x1. 


Step 3: Soft consensus reaching process. The moderator 
calculates the consensus measure values ol and c;; by 
means of expert and group IDR matrices to identify 
whether an acceptable CL is reached in the current round. 
If so, the soft CRP is finished and a consensus group 
IDR matrix M has been obtained. Generalized pignistic 
probability is introduced to deal with global ignorance 
and local ignorance as in (18) and the influence degree for 
fi — f; (Vi, 7) is determined by one of the two principles 
as mentioned above to construct G = [Gij]nx7, and 
proceeding to step 4. Otherwise, let ¢ = t+ 1 and carry 
out the feedback mechanism. The moderator identifies 
the experts who differed strongly from the group by (16) 
and provides advices for them by (17). Then, return to 
step 2 and follow the same two processes. Note that the 
moderator also should detect whether the maximum num- 
ber of rounds (means by ¢ = tay4x) has been reached 
before carrying out the feedback mechanism. If so, the 
mechanism will be ceased, taking the current group IDR 
matrix / as the final result and applying the generalized 


pignistic probability to obtain G = [§i;|zx7. Then 
proceed to step 4. 


e Step 4: Calculate the normalized IDR matrix. To 
guarantee that the normalized IDR matrix to in- 
finite power will converge to zero and that the 


total relation matrix can be smoothly obtained, 
L 


L 
~M ~ ~ : : 7 
G5 = rs Ta d, Gigs U+ es 2, Gij) is intro 
j= i= 
duced as shown in (1) to calculate the normalized IDR 


matrix D = [dij|rx1, where dj; = G35 /U5> Vi, j. 


e Step 5: Compute total relation matrix. The total relation 
matrix is calculated by A = D(I — D)~?. The influenc- 
ing degree of f; is computed by r; = ys ajj, the 
influenced degree of f; is computed by c; = ae Aji, 
the prominence degree of f; is computed by r; + c;, and 
the net influence degree of f; is computed by r; — c. 


e Step 6: Obtain the major factors with the threshold 
value. To simplify the complexity of a system to a 
manageable level, negligible factors should be filtered 
out. Only those factors whose prominence degrees are 
greater than the threshold value (r; + c; > 7) should be 
chosen and considered in the complex system. Generally, 
the cause-effect relation diagram can be made for the 
major factors based on their prominence degrees and net 
influence degrees. 

Through the analysis of total relation of the factors by DE- 
MATEL, a better understanding of the structural relation and 
an ideal way to solve complicate system problems can be 
obtained. The algorithm can be summarized to help moderator 
manage the whole processes. 
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Algorithm 1 The algorithm for DSmT-based group DEMATEL method with reaching consensus 
Inputs: Set of factors F={/,...,/,}, set of grade levels O={6,,0,,0,} , set of experts E={e,,...,.¢,}, DEMATEL threshold value 77, 
threshold values at expert level and pair-factors level ¢, and e, , round counter ¢, accepted maximum number of rounds fyy,¥ . 
Outputs: Set of major factors F={f, ...,f,}. 
Begin 
1, F={k,iz j|VEViV}, be = {6,1} for Vi, j,k 
Step 1: Collection process 
For i=1 to L 
For j=1 to L 
If i# 7 
Fork=ltoK  _ 
If {kia jjeF Then 
Give/modify the influence degree ;’ by expert e, 
Else 


EndFor 
Calculate expert weight W, ={w'"|k=1,....K} by Eqs.(8)-(10) 
Calculate m/" in Eq.(12) by discounting 4)’ with w*’ 
Make fusion to get m'=m/;' ®...® mj} with Eq.(5) 
Else 
Let _m! ={(@,)} 
Endif 
EndFor 
EndFor 
Get temporary group IDR matrix inthe tthround M‘=[mi],,, 


Step 2: Soft consensus reaching process 
For i=1 to L 
For j=1 to L 
Calculate consensus measure value at pair-factors level by cj =1- py 
EndFor 
EndFor 
Calculate consensus measure value at expert level by c' = Xi ci" i L 
Derive inconsistent expert set by Eq.(16) and derive E, = {k cM < é,} 


\v®| 


nel 


(63(0,)—m(@.)P /2 


Derive inconsistent pair of factors set by Eq.(17) and derive F, = {k,i j|ch’ <e, Ake E} 
If F#@ and t<t,,, Then 
Foreach {k,i7 j}e Fk 
Give the advices that expert e, is suggested to make corrections for bj" 
EndFor 
Let +1 
Return to Step 1 
Else 
Get final BBA-formed group IDR matrix M‘ =[m;],,, 
Turn to Step 3 
EndIf 


Step 3: Group DEMATEL process 
For i=1 to L 
For j=1 to L 
Take m’, into Eq.(18) and get its generalized pignistic probability {(6,, P’),(@,,P7),(@.P; )} 
Use the highest probability or the expected value principle to obtain the final influence degree g, 
EndFor 
EndFor z 
Construct final group IDR matrix G=[g,],, : ; 
Calculate normalized IDR matrix by D=G/max(max }g,,“+max ) ,) 
Calculate total-relation matrix by _4=(a,),,, =DU/=D) il 
Initialize set of major factors by F = 
For i=1 to L 
Calculate the important degree of f, by 7,+¢,=).",a,+> 4, 
If r+c¢,27 a a 
Add f, into the set of major factors by F=FUS, 
EndIf 
EndFor 
End 
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Table I 
ASSESSMENTS OF EXPERT ej IN THE FIRST ROUND. 


0.00 
0.00 
0.00 
0.50 
0.00 
0.50 
0.00 


Arh lfhAoh il fiahs | iim fs | fo fs 
Oy 0.00 0.50 0.00 0.00 
02 0.00 0.50 0.00 0.00 
01, UO, 0.30 0.00 0.00 0.00 
63 0.70 0.00 0.00 0.00 
01 U 43 0.00 0.00 0.00 0.00 
62 U 63 0.00 0.00 0.00 0.20 
61 U 62 U 63 0.00 0.00 1.00 0.80 


fo fa | fafa | faa fs | fs > fo | fs > fs 
0.00 0.50 0.00 0.00 0.00 
0.00 0.30 0.00 0.00 0.00 
0.00 0.20 0.00 0.80 0.80 
0.00 0.00 0.00 0.00 0.00 
0.20 0.00 0.00 0.00 0.00 
0.20 0.00 0.00 0.00 0.00 
0.60 0.00 1.00 0.20 0.20 


IV. NUMERICAL COMPARISON AND DISCUSSION 


In this section, we apply the proposed method and the fuzzy 
DEMATEL method to conduct a numerical simulation case, in 
which the key factors will be selected by two methods respec- 
tively. Afterwards, the results are compared and discussed. 


A. The Proposed Method 


e Step 1: Define DEMATEL problem and soft CRP param- 
eters. 
Similar to the examples in the literatures about DSmT 
[69], here the group IDR matrix construction of the given 
example works on the classical power set 2°, not on the 
hyper-power set DO. In the numerical simulation case, 
suppose the set of factors is F = {f1, fo, fs, fa, fs}, the 
set of experts is EF = {f1, e2,€3,€4, e5}, the set of grade 
levels is O = {61, 02,03} = {0,1, 2}, the threshold value 
is 7 = 0.35 , and the acceptable CL threshold values at 
expert level is €, = 0.55, and at pair-factors level is e¢ = 
0.50, the round counter is t, and the maximum number 
of rounds is tyyax = 5. 


e Step 2: Expert assessments extraction and fusion. 
Experts are advised to indicate the influence degree to 
which he or she believes factor f; has an effect on factor 
f; (denoted by f; — f;) under the framework of DSmT. 
As an example, the BBA functions on each pair of factors 
fi — f; from expert e, are shown in Table I. Due to 
limited space, other experts’ assessments are not given in 
this paper. 

On the basis of the experts’ assessments, we calculate the 
similarity matrix by the Euclidean similarity function as 
given in Definition 8 and shown in (19). 


1.00 0.20 0.16 0.13 0.18 
0.20 1.00 0.15 0.17 0.19 
S=|0.16 0.15 1.00 0.14 0.18]. (19) 
0.13 0.17 0.14 1.00 0.18 
0.18 0.19 0.18 0.18 1.00 


Hence, Sup(e,)' = 0.20 + 0.16 + 0.13 + 0.18 = 0.68, 
Sup(e2)' = 0.72, Sup(e3)' = 0.64, Sup(es)! = 0.62, 
and Sup(es)' = 0.73; the expert weight is computed as 
wht = Sup(er)!/ 3p_, Sup(ex)! = 0.68/3.40 = 0.20, 
wt = 0.21, w?! = 0.19, wt! = 0.18, and w®! = 0.22; 
the discounting parameters are derived by w!! 
wot maxi ws |k = 1y.. 2,5) = 0,20/0.227 =. 0.93, 
wt = 0.98, w3! = 0.87, wt = 0.82, and w>! = 1.00. 
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Taking ie and w*:! into Shafer’s discounting method 
(as in (11)), we derive the discounted assessments me”, 
i,j =1,...,5, and k = 1,2,3,4,5. With the help of the 
MATLAB code for PCR5, we fuse the BBA functions 
and derive a holism result for the influence degrees 
between any two factors, as shown in Table II. 


Step 3: Soft consensus reaching process. 

The moderator calculates the consensus measure values 
ci" and ck! by means of 6%; and ml}, to identify 
whether an acceptable CL has been reached. Taking 
expert and fusion assessment BBA functions into (13) and 
(14), we calculate the consensus measure values at two 
levels, as shown in Tables III and IV. The bold values of 
data in Table III indicate the pair of factors that do not 
reach the set consensus level. The bold values of data 
in Table IV indicate the experts that do not reach the 
set consensus level. According to the selected threshold 
value €- = 0.55 and (16), we find that the acceptable 
CL is not reached in this round for F, # Qj. Then, 
the moderator lets ¢ = 1 +1 = 2, which is lower than 
tmaAx, and carries out the feedback mechanism. The 
order from expert level to pair-factors level should be 
followed to determine the inconsistent experts and the 
pairs of factors that need to be modified (for those values 
less than ¢¢ = 0.50). Tables HI and IV show that expert 
e3 greatly differs from the group results on the pairs of 
factors f; - fu, fo > fa, and f4 —- fs, and e4 greatly 
differs from the group results on the pairs of factors 
fi —> fo; fo —> fs, fe —> fa, and fa —> fs (see those 
underlined data in Tables HI and IV). Experts e3 and e4 
need to modify those assessments on the mentioned pairs 
of factors. V and VI. 

Because some of the expert assessments are changed. In 
the second round, expert weights are recalculated to be 
wi? = 0.19, w? = 0.20, w®? = 0.20, wt? = 0.21, 
and w?? = 0.21, and the corresponding discounting 
parameters are standardized as wt? = 0.89,7;? = 0.93, 
w? = 0.96, w*? = 1.00, and w*? = 1.00. Taking 6; 
and w*:? into (11), we obtain the discounted assessments 
me, (Vi, 7) and fuse these results by PCR5 to construct 
the group IDR matrix in the second round. The moderator 
calculates the new consensus measure values at two 
levels. Fusion results and consensus measures at expert 
level in the second round are shown in Tables VII and 
VU. 
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Table I 
FUSION RESULTS OF EXPERTS’ ASSESSMENTS IN THE FIRST ROUND. 
Ath lfAoth | fAiofs | fof | fohs | hohe | fsa is | faa fs | fs > fo | fs > fs 
Oy 0.00 0.22 0.00 0.13 0.00 0.05 0.44 0.00 0.07 0.05 
02 0.00 0.21 0.38 0.26 0.34 0.19 0.17 0.15 0.25 0.05 
01, UO, 0.08 0.16 0.00 0.00 0.03 0.05 0.08 0.00 0.29 0.32 
63 0.66 0.22 0.19 0.27 0.20 0.56 0.12 0.56 0.22 0.42 
01 U 3 0.02 0.06 0.00 0.00 0.00 0.05 0.16 0.06 0.01 0.06 
62 U 63 0.02 0.00 0.17 0.07 0.20 0.04 0.00 0.05 0.07 0.04 
01 U 02 U 63 0.22 0.13 0.26 0.27 0.23 0.06 0.03 0.18 0.09 0.06 
Table III 
THE CONSENSUS MEASURE VALUES AT PAIR-FACTORS LEVEL IN THE FIRST ROUND. 
Ath lAsth | fiofs | fos | feohs | fofs | fsa is | faa fs | fs > fo | fs > fs 
el 0.78 0.65 0.38 0.53 0.58 0.40 0.80 0.29 0.56 0.53 
e2 0.74 0.61 0.50 0.53 0.35 0.42 0.78 0.79 0.56 0.79 
€3 0.78 0.68 0.32 0.57 0.57 0.20 0.56 0.45 0.52 0.55 
e4 0.27 0.57 0.53 0.68 0.37 0.35 0.80 0.38 0.66 0.70 
e€5 0.78 0.67 0.44 0.57 0.49 0.67 0.57 0.77 0.66 0.60 
Table IV 
THE CONSENSUS MEASURE VALUES AT EXPERT LEVEL IN THE FIRST ROUND. 
e1 e2 €3 e4 e5 
cel | 0.55 | 0.61 | 0.52 | 0.53 | 0.62 
Table V 
ADJUSTED ASSESSMENTS OF EXPERT €3 IN THE SECOND ROUND. 
ficvfo| Ao | fio | fica fs | feos | foo fs | fs fa | faa fs | fs fe | fs > fs 
01 0.00 0.00 0.00 0.50 0.00 0.04 0.00 0.00 0.00 0.00 
02 0.00 0.00 0.22 0.50 0.80 0.16 0.00 0.28 0.00 0.00 
01 UO2 0.30 0.30 0.00 0.00 0.20 0.04 0.20 0.00 0.00 0.70 
03 0.70 0.30 0.37 0.00 0.00 0.45 0.40 0.28 0.00 0.00 
0, U 63 0.00 0.00 0.00 0.00 0.00 0.04 0.40 0.07 0.20 0.30 
02 U 63 0.00 0.00 0.07 0.00 0.00 0.03 0.00 0.07 0.50 0.00 
6, U 82 U 63 0.00 0.40 0.34 0.00 0.00 0.24 0.00 0.30 0.30 0.00 
Table VI 
ADJUSTED ASSESSMENTS OF EXPERT e4 IN THE SECOND ROUND. 
Ath lho | fiistfhs | fits | fo ohs | feo fsa | fsa fs | faa fs | fs > fe | fs > fs 
Oy 0.00 0.00 0.00 0.00 0.00 0.03 0.50 0.00 0.00 0.00 
02 0.00 0.00 0.00 0.50 0.22 0.14 0.30 0.08 0.00 0.00 
01 U @2 0.06 0.30 0.00 0.00 0.02 0.02 0.20 0.00 0.40 0.00 
63 0.47 0.70 0.50 0.50 0.13 0.38 0.00 0.36 0.60 0.60 
01 U 43 0.02 0.00 0.00 0.00 0.00 0.05 0.00 0.14 0.00 0.20 
02 U 63 0.02 0.00 0.50 0.00 0.13 0.12 0.00 0.12 0.00 0.20 
0, U 02 U 3 0.43 0.00 0.00 0.00 0.50 0.26 0.00 0.30 0.00 0.00 
Table VII 
FUSION RESULTS OF EXPERTS’ ASSESSMENTS IN THE SECOND ROUND. 
Ath lfAioth | fiofs | fiofs | feos | foo fs | faa fs | faa fs | fs > fo | fs > fs 
Oy 0.00 0.20 0.00 0.12 0.00 0.02 0.43 0.00 0.06 0.06 
02 0.00 0.20 0.47 0.26 0.41 0.10 0.17 0.06 0.22 0.05 
01 UO2 0.16 0.17 0.00 0.00 0.05 0.11 0.08 0.00 0.26 0.29 
63 0.69 0.26 0.13 0.23 0.20 0.51 0.14 0.74 0.27 0.42 
01 U 3 0.04 0.06 0.00 0.00 0.00 0.05 0.17 0.04 0.02 0.07 
02 U 63 0.04 0.00 0.16 0.06 0.26 0.08 0.00 0.02 0.08 0.05 
6, U 62 U 63 0.07 0.11 0.24 0.33 0.08 0.13 0.01 0.15 0.09 0.06 
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Table VIII 
THE CONSENSUS MEASURE VALUES AT EXPERT LEVEL IN THE SECOND ROUND. 
e1 €2 e&3 e4 e5 
c® | 0.55 | 0.63 | 0.66 | 0.68 | 0.62 
As shown in Table VIII the acceptable CL has been Table IX 
reached in the second round, which means that the current THE ATTRIBUTE PARAMETERS OF FACTORS. 
group IDR matrix is a consensus result and can be Ps © |r+pe |roe 
applied to the group DEMATEL method. Then we derive fi | 041 | 0.00 | 0.41 0.41 
pignistic probability transformation for each element in fa | 0.19 | 0.19 | 0.38 0.00 
: : . ; fz | 0.02 | 0.24 | 0.26 | -0.22 
final BBA-formed group IDR matrix M* = [mj;|5xs5. fa | 0.13 | 0.23 | 036 | -0.10 
Take the pair of factors f; —> f2 as an example, fs | 0.13 | 0.22 | 0.35 | -0.09 
m42(01) = m(01)+5m(01U02)+4$m(4,U43)+ 4m(O1U 
Oo U 3) = 0.12, m42(62) => 0.12, my42(63) = 0.76, and 0.5 ro 
gg = 0x 01241 x 0.12 +2 x 0.76 = 1.64. After x fe et 
pignistic probability transformation, the final group IDR , 
matrix is constructed as in (20). as 
9.25 + 
0.00 1.64 0.99 1.22 1.18 
7 0.00 0.00 1.33 1.53 0.00 db 
G=|0.00 0.00 0.00 0.67 0.00 (20) ; ! | _ P, es 

0.00 0.00 0.00 0.00 1.76 Od 0.2 OF ee 0.5 

0.00 1.12 1.25 0.00 0.00 "ff 
Step 4: Calculate the normalized IDR matrix. -0.25 i 
Let consider pt = 0.00001.We obtain 

L 

~I ~ Fis ~ 
G5 = mmae( thee ye Gig, b> aes » Gij) * 4.98, ost 


j=l i= 
and the normalized IDR matrix can be calculated by (1). 
The result is shown as in (21). 


Figure 3. The cause-effect relation diagram derived by the proposed method. 


0.00 0.33 0.20 0.24 0.23 
0.00 0.00 0.27 0.31 0.00 
D= }0.00 0.00 0.00 0.13 0.00 (21) 
0.00 0.00 0.00 0.00 0.35 
0.00 0.22 0.25 0.00 0.00 


Step 5: Compute the total relation matrix. 
The total relation matrix can be calculated by (2). The 


result is shown as 


whereas f3, f4 and f; are net receivers. Because r—c = 0 
for fo, fg is neither net causer, or net receiver. 

e Step 6: Obtain major factors with the threshold value. 
Because of the threshold value 7 = 0.35, we select the 
factors whose degrees of the important role are greater 
than the threshold (r; + c; > 7) as major factors. As a 
result, the major factors in the complex system are F= 
{fi, fa, fa, fs}. 

We find that the Algorithm 1 can be programmed easily. 

Thus, the proposed method in this study is valid and applicable 
for solving group DEMATEL problems. 


B. The Fuzzy DEMATEL Method 


[ee 0.14 0.08 0.10 00) 
0.00 0.00 0.08 0.11 0.00 
A= |0.00 0.00 0.00 0.02 0.00 (22) 
0.00 0.00 0.00 0.00 0.13 
0.00 0.05 0.08 0.00 0.00 
According to A = [ajj]5x5 and (3), we derive the 


following parameters: the total influence r; = yx Qij 
given by the factor f; to other factors, the total influence 
Gg = i aj, received by the factor f; from other 
factors, the degree of the important role r; + c;, and the 
net influence r; —c;. These parameters are listed in Table 
IX. Additionally, we construct the cause-effect relation 
diagram of factors with the horizontal axis r + c and 
the vertical axis r — c, as shown in Fig. 3. According 
to Definition 4 and Fig. 3, f; is known as net causer, 


The fuzzy DEMATEL method proposed by Wu and Lee 
[18] is a vital and significant improvement of DEMATEL. 
Since the fuzzy DEMATEL method is not only a major 
extension of DEMATEL but also extracts information with 
fuzzy linguistic terms, here we employ it to make a com- 
parison with the proposed method. To ensure the results of 
two kinds of methods can be compared with each other, 


the 


E= 


are 


set of factors F = { fi, fo, fs, fa, fs}, the set of experts 
{e1, €2, 3, €4,e5} and the threshold value is 7 = 0.35 
all the same as in Subsection IV-A. Besides, the inputs 


of fuzzy DEMATEL method should be generated from the 
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initial assessments given by experts in the proposed method 
to ensure its comparability. According to the procedure of 
fuzzy DEMATEL method, its computation processes can be 


summarized as follows. 


e Step 1: Define DEMATEL problem and fuzzy linguistic 


scale. 

The DEMATEL problem is defined as the same as in 
Subsection IV-A and the fuzzy linguistic scale is defined 
as in Table X, where {No, L, H} are equal to {6;, 02, 03} 
as defined in this paper. 


Table X 
THE FUZZY LINGUISTIC SCALE. 


Linguistic terms 
High influence (H) 
Low influence (L) 
No influence (No) 


Triangular fuzzy numbers 
(0.75,1.00, 1.00) 
(0.25,0.50,0.75) 
(0.00,0.00,0.25) 


Step 2: Exact and fuse expert assessments.. 

In order to ensure the comparability, the inputs of fuzzy 
DEMATEL method should be generated from the initial 
assessments given by experts in the proposed method. 
Because the assessments given by experts in Subsection 
4.1 are in the form of BBA functions, how to make 
a transformation from the BBA functions to the inputs 
of fuzzy DEMATEL method is quite important. It is 
reasonable and logical to choose the grade level with 
the highest probability as expert assessments in the fuzzy 
DEMATEL method. Following this thought, the inputs 
of fuzzy DEMATEL method are transformed from the 
experts’ assessments in Subsection IV-A. For example, 
the transformed assessments of expert are shown in Table 
XI. As introduced in fuzzy DEMATEL, the Converting 


Table XI 
THE TRANSFORMED ASSESSMENTS OF EXPERT €1. 


fi_| fo | fs | fa | fs 
fi | No | H L No | No 
f2 | No | No} H H | No 
f3 | No | No | No L No 
fa | No | No | No | No | H 
fs | No | No L No | No 


Fuzzy data into Crisp Scores (CFCS) defuzzification 
method is applied to aggregate these assessments by five 
experts. Due to limited space, the detailed CFCS steps 
can be referred to [18] and are not repeated here. The 
IDR matrix G’ = [9j;]5x5 is produced as in (23). 


0.04 0.19 0.87 0.19 0.50 
0.19 0.04 0.96 0.96 0.04 
G' = |0.04 0.04 0.04 0.41 0.19] . (23) 
0.04 0.04 0.04 0.04 0.87 
0.04 0.50 0.59 0.04 0.04 


Step 3: Calculate the normalized IDR matrix. 
The normalizing method for IDR matrix in fuzzy DE- 
MATEL is the same as in traditional DEMATEL. The 
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normalized IDR matrix is calculated by taking (23) into 
(1) and it is shown as in (24). 


0.02 0.08 0.385 0.08 0.20 
0.08 0.02 0.38 0.38 0.02 
D' = |0.02 0.02 0.02 0.16 0.08] . (24) 
0.02 0.02 0.02 0.02 0.35 
0.02 0.20 0.24 0.02 0.02 


e Step 4: Compute the total relation matrix. 


The computing method for total-relation matrix in fuzzy 
DEMATEL is also the same as in traditional DEMATEL. 
The total-relation IDR matrix is calculated by taking (24) 
into (2) and it is shown as in (25). 


0.02 0.01 0.18 0.02 0.07 
0.01 0.02 0.21 0.20 0.00 
A’ = {0.00 0.00 0.02 0.03 0.01] . (25) 
0.00 0.00 0.00 0.02 0.14 
0.00 0.05 0.09 0.00 0.02 


According to (3), we derive the following parameters as 
shown in Table XII and the cause-effect relation diagram 
as shown in Fig. 4. According to Definition 4 and Fig. 4, 
it is obvious to find that f; and fz are net causers, while 
fs, f4 and fs are net receivers. 


Table XII 
THE FACTORS’ ATTRIBUTES PARAMETERS. 


7 cf r+ Ps re | 
Fi | 030) 0.03 | 0.33 0.27 
fo | 0.44 | 0.08 | 0.52 0.36 
fz | 0.07 0.50 | 0.57 -0.43 
fs | 0.17 | 0.28 | 0.44 -O.11 
fs | 0.16 | 0.25 | 041 -0.09 
OS 5 © 
My 
jz" 
0.25 jis 
rt+c 
‘ 
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Figure 4. The cause-effect relation diagram derived by fuzzy DEMATEL 
method. 


e Step 5: Set a threshold value and obtain the major factors. 


Because of the threshold value 7 = 0.35, only the factors 
with r’ + c’ > 7 should be chosen as the major factors. 
As a result, the major factors in the complex system are 


F" = { fo, fa, fa, fs}- 
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C. Discussion 


It is obvious to find that the major factors in the 
complex system determined by the proposed method are 
F = {f1, fo, fa, fs}, while those determined by the fuzzy 
DEMATEL method are F” = { fo, f3, fa, fs}. The results of 
two kinds of methods are different from each other. Which 
one is more reasonable? Now we make discussions from the 
following three aspects. 

(1) Expert assessment extraction mechanism. The assess- 
ments of experts are used as the fundamental inputs of 
DEMATEL whether in the proposed method or in the fuzzy 
DEMATEL method. The proposed method allows experts 
to give assessments with BBA functions, while the fuzzy 
DEMATEL method employs the fuzzy linguistic scale to 
express their assessments. The local or global ignorance in 
experts’ minds can be well reflected in the proposed method 
(e.g., bt}, = {((01, 92), 0.30), (43, 0.70)}), but it is unfortunate 
to find that such the ignorance cannot included in the fuzzy 
DEMATEL method (it seems that only the grade level/fuzzy 
linguistic scale with the highest probability may be allowed to 
express assessments, e.g., 013, = {L} = {03}). Consequently, 
it is believed that the proposed method is more feasible 
than the fuzzy DEMATEL method in the aspect of expert 
assessment extraction mechanism. 

(2) The importance roles of experts. As discussed in sub- 
section III-A, the importance roles of experts are reflected by 
expert weights that are calculated based on expert assessments 
information, hence, the weight parameters could effectively 
express the relative importance of experts in the group and they 
have a significant impact on the group DEMATEL decision 
results. If the importance roles of experts are neglected in 
the process of decision making, the decision results may 
lose effectiveness. The proposed method calculates expert 
weights based on expert assessments with the aid of evidence 
distance and employs Shafer’s discounting method to modify 
the subjective assessments. Unfortunately, the importance roles 
of experts are unconsidered in the whole processes of fuzzy 
DEMATEL method. Consequently, it is believed that the 
proposed method is more accurate than the fuzzy DEMATEL 
method in the aspect of reflecting the importance roles of 
experts. 

(3) Group consensus reaching. Group consensus reaching is 
a key problem in the GDM field. Only when the assessments 
given by experts reach an acceptable CL, the GDM results are 
seen to be valuable and credible. In other worlds, the GDM 
results that lack of consensus may be ineffective and have 
few reference value for decision-making. From subsection 
IV-B, we know that the fuzzy DEMATEL method is a static 
method without taking the group consensus reaching into 
consideration. In the proposed method, we apply the soft 
CRP into the construction of group IDR matrix to help the 
experts group reach a high CL, among which the feedback and 
modification mechanism are introduced. Consequently, it is 
believed that the proposed method is more reasonable than the 
fuzzy DEMATEL in the aspect of reaching group consensus. 


V. CONCLUSIONS 


In the present study, the DSmT is used to extract and 
fuse expert assessments, and a soft CRP is introduced to 
construct the group IDR matrix for the DEMATEL. The 
DSmT-based group DEMATEL method and the corresponding 
algorithm are proposed. Moreover, a numerical comparison is 
performed to discuss the applicability of the proposed method 
and algorithm. The main contributions of the present study 
can be summarized into three aspects. 

Firstly, an expert assessment extraction and fusion mecha- 
nism is established on the basis of DSmT. Expert assessments 
on the influence relations among factors are extracted by BBA 
functions, which can help experts to express uncertainty and 
incompleteness assessments. The PCR5 of DSmT, which can 
overcome the defects of intuitional paradox in Dempster’s 
combination rule, is employed to make fusion for the in- 
dividual BBA functions discounted by Shafter’s discounting 
method. Secondly, expert weights are defined and introduced 
to reflect importance roles of experts in a group. Following 
the principle of pairwise comparisons, expert weights are 
calculated by the Euclidean similarity function to reflect 
experts’ relative importance in the group. Expert weights are 
not fixed during the whole processes of group IDR matrix 
construction and they may vary dynamically with different 
expert assessments in each round. Shafer’s discounting method 
is used to discount expert assessments in each round so 
as to reflect the importance roles of experts in the group 
dynamically. The above processes are beneficial to obtain an 
accurate group IDR matrix for the DEMATEL. 

Thirdly, a soft CRP is established to construct the group IDR 
matrix with consensus and an algorithm is summarized for the 
group DSmT-based DEMATEL. The consensus measures at 
expert and pair-factors levels are defined to help establish the 
feedback/modification mechanism, based on which a soft CRP 
is established for the construction of consensus group IDR 
matrix, so that experts can reach a high CL. An algorithm 
for DSmT-based group DEMATEL method with reaching 
consensus is proposed to identify major factors in a complex 
system. The proposed algorithm can be programmed easily and 
is valid and applicable for solving group DEMATEL problems. 

The proposed method forms an expert assessment extraction 
mechanism on the basis of the BBA function. However, the 
belief degrees or probabilities in BBA functions given by 
experts may hardly be assessed with exact values in more 
complex situations. Therefore, investigating how to deal with 
the group DEMATEL with the interval BBA function may be 
a good direction for future research. 
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Abstract—Belief function theory manages uncertain informa- 
tion and offers useful combination rules for multi-sensor fusion. 
However, when sensor readings are in conflict or even unreliable, 
the quality of the fusion result is significantly affected. Recently, 
many discounting approaches have been proposed to combine 
unreliable sensor readings. The discounting factors involved in 
these methods are often determined based on a single criterion 
which is not sufficient in general to obtain a precise assessment 
of the reliability degrees of the sources to combine. In this 
work, that is why we propose a novel discounting combination 
approach, in which the reliability factors are obtained by using 
the multi-criteria strategy. Our discounting combination method 
includes two main steps. The first step to assess the sensoras 
reliability is based on belief function-based technique for order 
preference by similarity to ideal solution (BF-TOPSIS). The 
second step is to discount and global combine all involved sensor 
readings according to their degree of reliability with proportional 
conflict redistribution no. 6 (PCR6) rule. Several simulations and 
comprehensive comparisons with classical approaches are given 
to show the efficiency of our proposed method. 


Keywords: Belief function theory, multi-sensor fusion, con- 
flict measure, multi-criteria, sensor reliability. 


I. INTRODUCTION 


In order to achieve the accurate and complete description 
of an environment, multi-sensor fusion technology is applied 
to combine data from multi-sources. In view of their good 
applicability, multi-sensor systems play an essential role in 
real-world applications which include wireless sensor net- 
works [1], [2], image processing [3], [4], target tracking [5], 
health-related areas [6], environmental monitoring [7] and so 
on [8], [9]. Nevertheless, depending on environmental and 
working conditions, like sensor failure, deterioration of energy 
supply, adverse weather conditions etc., the corresponding 
sensor readings can be incorrect, imprecise, conflicting or even 
unreliable so that it may yield a wrong decision. Thus, in 
order to avoid a degradation of the multi-sensor fusion system 
performances [10], the reliability degree of sensor reading 
needs to be estimated in the process of combination. 

Before evaluating the reliability degrees of sensors, the 
imprecision or uncertain information of sensor readings should 
be mathematically described. Many methods can be used to 
handle uncertainty information, such as maximum entropy 
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[11], [12], Bayesian theory [13]-[15] and Belief Functions 
(BF) theory [16]-[18]. BF allows to model uncertainty and 
fuse sensors’ measurements [19], and in this paper we focus 
our discussions on BF theory. Several classical combination 
tules are provided by BF theory to fuse the pieces of sensor 
readings. Among all available combination rules, Dempster’s 
tule proposed by Shafer in [20] is the most well-known 
tule still used in many application even if it remains very 
controversial. Indeed, if there exists the high or even low 
conflict between sensor readings during the combination, a 
counter-intuitive result may occur. To circumvent the problem 
of Dempster’s rule, usually the system designer discounts 
original sensor readings before applying the combination [21]— 
[24]. 

In the discounting approach, the primary challenge is to 
accurately estimate the reliability degrees of the sensors. 
Several researchers in [22], [23], [25] estimated the reliability 
degree of each sensor reading according to the single criterion. 
However, classical methods based on mono-criterion strategies 
are not good enough to assess the reliability factor. Recently, 
Frikha [26], [27] presented two multi-criteria strategies to 
compute the discounting factors. In [26], authors proposed a 
novel method to evaluate the degree of imprecision of sensor 
readings and conflict between pieces of evidence according 
to six criteria by using PROMETHEE II. Similarly, Frikha 
[27] suggested another way to estimate the reliability of the 
sensors based on Analytical Hierarchy Process (AHP). Based 
on Frikha’s works, Sarabi-Jamab in [28] also followed the 
multi-criteria line and proposed a new selective multi-criteria 
method based on AHP. Different from the mentioned methods 
in [26] and [27], all involved criteria are further evaluated 
and the most discriminative criteria are selected. Some 
appealing findings have been revealed in [26], [27] and [28] 
when compared to those mono-criterion approaches currently 
available in the literature. However, all the multi-criteria 
methods used in the aforementioned references require a data 
normalization step, which affects the process of the precise 
evaluation how much better or worse to some extent a sensor 
is with respect to the others. 
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In this paper, a novel multi-criteria discounting combination 
method is proposed. The discounting factors associated with 
sensor readings are evaluated by using Belief Function based 
Technique for order preference by Similarity to Ideal Solution 
(BF-TOPSIS) [29]. We take two classes of criteria into account 
in the process of calculating the weights of sensor readings. 
The first class is the conflict between the sensors. The second 
class is the imprecision of the information provided by each 
sensor. Once all the weights are calculated, the involved 
sensor readings are discounted and global combined with 
Proportional Conflict Redistribution no. 6 (PCR6) rule. The 
main contributions of this paper are summarized as follows: 


e A novel multi-criteria discounting combination approach 
is given. In our method, the recent proposed BF-TOPSIS 
is first applied to evaluate the reliability of the involved 
sensors. We also propose a new method to calculate 
weights of criteria involved in BF-TOPSIS. Moreover, the 
comprehensive comparisons between classical discount- 
ing combination methods are also illustrated in detail; 

e The global fusion with PCR6 rule in Dezert-Smarandache 
Theory (DSmT) is applied for combining sensor readings. 
Dislike the conventional discounting combination ap- 
proaches [26], [27] and [28], which combine all evidences 
with Dempster’s rule, we herein use PCR6 to combine all 
discounted sensor readings. The advantage of this new 
approach is that it yields reasonable results particularly 
when the sensor readings are in high conflict. 


This paper is organized as follows: we introduce the basic 
concepts of BF theory in section II. The Section III describes 
the discounting strategies in BF theory and the involved criteria 
applied in our proposed method. In section IV, our proposed 
discounting combination rule based on the BF-TOPSIS is de- 
scribed in detail. Then, the simulation results and discussions 
are given in section V. Finally, we conclude and give some 
perspectives in section VI. 


II. BASICS OF BELIEF FUNCTIONS 


Belief function assigns mass of belief to the subsets of 
Frame of Discernment (FoD). In general, a mass function m/(-) 
is a mapping defined as follows [20] and for X C 29: 


m:2° + (0,1), 55 m(X)=1, (1) 
XC2° 
m(0) = 0,m(X) > 0. (2) 


where O represents FoD which includes a set of p hypotheses. 
m(-) is a mapping function and this function is also called Ba- 
sic Belief Assignment (BBA). When m(X) > 0, the element 
X is called Focal Element (FE) of m. The set of focal elements 
of a BBA m(-) is denoted F'(m). 

In BF theory, the combination of two independent Body of 
Evidences (BoEs) by Dempster’s rule is denoted mj, @ mo. 
For VX C 2°, X #0, the belief of X is given by [20]: 


(mom\(X)=—— 


1 
Y,ZC2°, YNZ=X 


where K represents the degree of conflict between m, and 


Mg as: 


Y,ZC2° YNZ=0 


To palliate the drawbacks of Demspter’s rule, Martin and 
Osswald [30] proposed a very interesting combination rule: 
PCR6. Due to its good performance, it is widely applied in 
recent applications. The combination of two BBAs m,(-) and 
m2(-) by the PCR6 rule is given as follows: for mpcre(9) = 
0 and VX € 2° 


mpcre(X) = m12(X) 


my(X)?m2(Y) m2(X)?m1(Y) 
* vee eanvag PAO) Ema) | ma(X) mV)" 
(5) 
where my2(X) = Dy,ze2e\ynzex M1(Y )ma(Z) is the 


conjunctive operator, and each element X and Y are expressed 
in their disjunctive normal form. If the denominator involved 
in the fraction is zero, then this fraction is discarded. 

We recall that the PCR6 formula for the combination of two 
BBAs coincides with PCRS formula originally developed by 
Smarandache and Dezert in [31]. The combination of more 
than two BBAs altogether with PCRS and with PCR6 fusion 
rule provides in general different results. The choice of PCR6 
with respect to PCRS has been justified at first by Martin 
and Osswald in [30] from on a specific application, and then 
theoretically by Smarandache and Dezert in [32]. The general 
formula of PCR6 for combining more than two BBAs is given 
in details in [30] with examples. 

To make a final decision in BF from a BBA m(-), Smets 
[20] suggests to transform m(-) into pignistic probability by 
using function BetP in pignistic level. For VX C 0, Bet P(-) 
is defined as: 


Berry Ss 


Y¥C2° 


xny 
ae HAY). (6) 


where |Y| refers to the cardinality of a subset Y. 


III. DISCOUNTING PROCEDURE AND ASSESSMENT 
CRITERIA 


A. Discounting Procedure 


The discounting operations are frequently conducted by 
using the discounting factor w with each sensor reading. 
Firstly introduced by Shafer [20], this factor w is evaluated 
and regarded as the reliability of the sensor reading. In this 
paper, the discounting factors are determined by multi-criteria 
strategy: BF-TOPSIS. In general, the parameter w varies in an 
interval: [0,1]: that is to say, if the value of w is closer to 1, 
the greater the reliability of sensor reading is. The discounting 
steps are given as follows and for VX € 2°\{@}: 


m(X) =w+m(X), 
m* (0) =w-m(@0) + (1-w). 
where X refers to the FE of m(-) and w € (0, 1]. 


(7) 
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B. Assessment Criteria 


1) Imprecision: The degree of non-specificity of a sensor 
reading is also regarded as a degree of imprecision. With the 
belief function, the imperfection of BoE is mainly caused by 
two factors: the first one is contradiction (strife: C,) proposed 
by Vejnarova and Klir [33] and imprecision (non-specificity: 
C2 ) proposed by Dubois and Prade [34]. The measure of 
contradiction (i.e. the strife C,) is defined as: 


Cy (m) = St(m) 


=- SY m(X)loe,( 


X€EF(m) YeF(m) 


IXNY| 
|X| 


m(Y)), 
(8) 


where |X M Y| and |X| refers to the cardinality of the subset 
|X 1Y| and X. 
Also, the measure of non-specificity C2 is defined as: 


C2(m) = I(m) 
= So m(X)-logs(|X)), (9) 


XEF(m) 
where |X| refers to the cardinality of a subset X. 


2) Conflict Between the BoEs: The second class of evalua- 
tion criteria relates to conflicting information, which is usually 
represented by m@(@) and distance measures. Two conflict 
measures are used in this paper: Shafer’s weight of conflict 
(C3), and the interval distance between evidences (C4): 


C(m) = Conf(m) 


= 57 DE loee(t = K(m,mi))), 0) 


where KC is the conflict between two BoEs m(-) and m,(-) 
calculated by Eq.(4) and M is the number of sensor readings. 

The criterion C4 is based on the interval distance dpy, see 
[35], [36], that is: 


Calm) = dB p(m) = 57 das(m,ms) 
=i fre Se (BICD, BEX] 
i=1 X€EF(m) 


(1) 


where, n, = 1 pe). M is the number of sensor readings, 
and p is the number of FEs in 0, BI(X) = [Bel(X), PI(X)}, 
and BI;(X) * [Bel;(X), Pl;(X)], and 


a (lolol, 61) = [2° - 2) 
1,b-a b’ — a’ 2 1 
| Al 5} 5 1] 


IV. NEW COMBINATION APPROACH BASED ON BF-TOPSIS 


A. Construction of Scoring Matrix 


At first, there exists A;, i = 1,...,M sensors, and each 
sensor gives a corresponding reading m;, i = 1,- - -,, 
according to eq. (12) 

X1 X2 X10 
Ay | mi(X1) mi (X2) m1(Xie1) 
Az | m2(X1) — m2(X2) m2(XgIe1 ) (12) 


Am Pee mu (X2) mu (X01) 
where X € 2°. 

Then, one calculates the evaluation of all sensors from 
the perspective of imprecision (C and C2) and conflict (C3 
and C4), and then constructs the following scoring matrix S 
defined by eq. (13) 


Ay Bee neste VE Ax 

Ci] S1(A1) Si (A2) Si(A;) ... Si (Aas) 

C2] S2(A1) S2(A2) S9(A;) ... S2(Am) (13) 
C3] S3(A1) S3(A2) S3(A;) ... S3(Aa) 

C4 | S4(A1) S4(A2) S4(A;) S4(Am) 


Without loss of generality, we just use the general mathe- 
matical symbol C;, 7 = 1,...,.N (in this paper, N = 4) to 
represent one of the mentioned four criteria for convenience 
in the following sections. That is: 7 = 1, C, 4 St(-); 7 = 2, 
Cp =10)s9 = 3.Cs= Conf) 7 =4 Ce = 080: 

B. Construction of BBAs for Multi-Criteria Decision Making 
(MCDM) Problems 


In traditional mono-criterion problems, the weights of sen- 
sors A;, i = 1,...,M are calculated according to a single 
criterion. However, in MCDM problems, the direct weights 
associated with different criteria can be very different. There- 
fore, efficient fusion techniques must be developed in order 
to provide the global evaluating solution to solve the MCDM 
problem. For this aim, original BF-TOPSIS is used to Estimate 
the Ranking Vector (ERV) from all evidences that support or 
refute each sensor thanks to BBAs. First, the FoD is the set 
of sensors, that is A £ {A,, Ao,..., Aj}. The construction 
of BBAs is based on the method of construction presented in 
[29] 


where m,(A;) means the support belief in favor of A; accord- 
ing to criterion C;, m;(A;) means the support belief against 
A; according to Cj and m,(A; U A;) means the uncertainty 
degree whether support or against A; based on Cy. 


Bel,;(A;), Pl;(A;) and Bel;(A;) in Eqs.(15) and (16) are 
defined as follows: 


Sup;(Ai) = |S;(Ai) — Sj (Ag) I 
kE{1,...,M}|Sj(An) <5; (Ai) 


(17) 
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Sup; (A; 
pasa) 2 Sept 
max(Sup;(A1), Sup;(A2),...,Sup;(Am))’ 
(18) 
Inf;(Ai) = - x |S5(Ai) — Sj(Ax)| 
kE{1,...,.M}|S; (Ap) >S; (Aa) ae 
4.) 4 Infj(Ai) 
min(Inf;(A1), Inf; (Ag), erate ,Inf;(Am)) ; 
(20) 
and 
Pl;(Ai) = 1 — Bel;(Ai). (21) 


C. Calculation of Criteria Weights 


In original BF-TOPSIS [29], the weights of criteria are 
often chosen subjectively which limits the applications of 
this new multi-criteria strategy in practice. In this paper, we 
automatically determine the importance of each criterion from 
the scoring matrix without manual intervention. 


1) Normalized Scoring Matrix According to the Max-Min 
Scaling: Here, we first transform all values of S;(A;) in S 
(Eq.(13)) into the same measurement scale based on Max-Min 
scaling. 


Definition 1: The normalized scoring matrix is defined by 


Sj (A; — — Ai) 
Si (A;) = aa re (22) 
where i € {1,...,M},j € {1,...,N} and Ad; an 


Al? refer to ea maximum value and minimum value in 
[S;(A1), $j(A2),- > +, S7(Am)]. 

The aim ‘of this Mae Min scaling is to transform the original 
scoring linearly so that all the values of elements in scoring 


matrix are within the interval [0, 1]. 


2) Pairwise Comparison Matrix for All Criteria: Based on 
Eq.(22), we can calculate the pairwise comparison results for 
all criteria according to the following definition: 

Definition 2: The pairwise comparison matrix is defined by 


doi21 exp(S},,(Ai)) 


PCh'h = M , (23) 
Dii=r exp(S}, (Ai) 
where h’ € {1,...,N}, and h € {1,..., N}. In this paper, 


the value of parameter N is 4, which represents the number 
of the involved criteria. 


Obviously, Eq.(23) is consistent because all pairs of criteria 
satisfy pcp X PCin = PCh’h- 


Proof: 

PChit X PCin = =e =1 xP(Sp, ‘(Ai)) x ye exp(S7(Aj)) 
yo exp(Si(Ai)) 04, exp( $4, (Ai) 
diai ©xP(S,/(Ai)) 

= M ’ = PCh'h 
Dia exP(S}, (Ai) 
End Proof. 


3) Weights Determination for All Criteria: Next, we can 
calculate the weight of each criterion v(C;) based on the 
following expressions: 


pie PCip 
(Cj) = “SE, (24) 
! PCih 
i (25) 
a PCih 


where 7 = 1,..., N and N = 4 in this paper; M is the number 
of sensor readings. 


D. Steps of BF-TOPSIS Algorithm 


e Step 1: From the scoring matrix S, compute 
BBAs Mm; (Aj), m5 (A;) and m5 (A; U Aj) 
using Egs. (15)-(16), and then construct 
vecs(A;) © [mj(A;),mj(A;),m,(A; U A,)] for 


each sensor A; according to the criterion C;; 

e Step 2: For each sensor A;, also construct th best 
ideal BBA vech@s!(A;) = [mbes!(A;), 0,0] = [1,0, 0] 
and the worst ideal BBA vec¥°rs'(A;) 4 
[0, rie ighagd C2 ;),0] = [0,1,0]; Then compute the belief 
interval distance [35], [36] dgr(vec;(A;), veck**(Aj)) 
and dpr(vec;(A;), vec¥ors*(Aj)); 

e Step 3: Compute the weighted 
distance dgi(vec;(Ai;), veck***(A;)) and — distance 
dgr(vec;(A;), vecv?"**(A;)) with the __relative 
importance weighting factor v(C;) of each criterion C;, 
that is 


average of 


N 
dis?’**(A;) © S° u(C5) - dar(vec;(As), vec#"(A;)), 
j=l 


(26) 


N 
dis”"*(A;) = N° v(C)) - dar (vec;(Ai), vec??""(A;)), 
j=l 
(27) 
where vee; (As) 4 [m;(A;:),m;(Ai), m3(Aij U A,)], 
vectes*(A;) © [1,0,0] and vecverst(A;) = [0, 1, 0]. 
° ve 4: The final weights w of the sensor A; with respect 
to ideal best solution A’est is then defined by 


digworst (A;) 


ee 
w(A;) digworst (A;) + disbest (Aj) 


(28) 
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Sensor Readings 
Aim, (-) 


§, (4), S5(4,),83(4).S4(4) 


Weights Calculation of Sensor Readings 


Combination of Sensor Readings 


t 


A,:m,(-) 5,(A,),S; (A, 53 (4, ) 554 (4, 


Ay, My, (+) 


BF-TOPSIS With 
Objective Weight 4 
Determination 


Discounting Procedure 


v 
Proportional Conflict Rule-6 (PCR6) 


v 
Final Combination Result 


Figure 1. The framework of our proposed multi-criteria discounting combination method. 


E. Discounting sensor readings and PCR6 combination 


e All the weights of the involved sensors are construct- 
ing a weight vector, which is denoted as Weight = 
{w(Aj1),w(Ag),--+-,w(Aas)} based on Eq.(28) and then 
discount all involved BBAs with Eq.(7). 

e PCR6 Combination Rule: According to the sensor read- 
ing of A;,i € {1,---,M} and PCR6 combination 
rule, we can globally combine all the involved sensor 
readings with their corresponding weights: Mfusion = 


w(Ajy w(A w(Ay 
PCR6(mz 4” (.), mZ42)(.), m7). 


F. A Proposed Combination Method and Fusion Process 


Because the unreliable sensors often provide conflicting and 
imprecision information which may lead to counter-intuitive 
results in traditional combination methods, the novel multi- 
criteria discounting combination method is used. Here, we 
would like to emphasize that the involved criteria mentioned 
in this paper can be modified to any criteria according to 
the actual demand in the process of the objective weight 
estimation. 

The process of our multi-criteria discounting combination 
method is illustrated in Fig.1 and in order to help the readers 
to reproduce this new combination method proposed in this 
paper, pseudo code is given in Algorithm 1. 


V. NUMERICAL SIMULATIONS AND DISCUSSIONS 


In this part, we make comprehensive comparisons between 
the proposed combination procedure and other classical com- 
bination rules. Also, several tests and comparative analysis 
are illustrated in details. Independent random runs of Monte 
Carlo simulations are generated to observe the appealing be- 
haviors of the proposed approach. All simulations results were 
obtained with MATLAB R2018a running with a hardware of 
Intel Core i7-5600U CPU at 2.60GHz and with 8G RAM. 


A. Target Recognition Context 


In this case, six sensors give their corresponding readings 
which consider the class of the same target in Table I. The 
common FoD of these sensors is O = {0 4 target 1, A. 4 
target 2,03 < target 3}. Among the given BBAs in Table I, we 
can notice that four of these sensors (1, 2, 4 and 5) give the 
maximum belief to ;. On the contrary, sensor 3 assigns most 


Algorithm 1: The Proposed Multi-Criteria Discount- 


ing Combination Method 


14 
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Input : Sensor Readings: 
Aj A my,(-), Ae iM ma(-), aa -,-Au ? myr(-). 
Output: The Fused Final BBA ™ fusion (-). 


for 7 = 1,---,N do 
fori =1,---,M do 
| $)(Ai) = Cy (mi(-)); 
end 

end 

for 7 = 1,---,N do 
for i= 1,---,M do 

83(Ai)— ATS 

| 9)(4s) = eof 
end 

end 

for j = 1,---,N do 


fori =1,---,M do 
Do ke {1--:,M}1S; (An) <8) (Ai) 195 (Ai) — 55 (Ae) 3 
Inf;(Ai) — 
— ee {1 y++,M}IS; (Ag) >8;(Aa) 195 (Aa) — S5(An)| 
end 
end 


for j =1,---,N do 
for i= 1,---,M do 
m;(A;) = Bel;(Ai); 
m;(Aj U Aj) 4 Pl,(Ai) = Bel; (Ai); 
end 
end 
for j =1,---,N do 


v(Cj) = Dhar (ssa, )/ MG 
end 
fori =1,---,M do 
. worst ; 
w(A;) = Tees PRTeRe 
end 
Discounting Step: Using Eq.(7) Based on Weight; 


Fusion Step: Mrusion = PCR6(mi(-),---,;mar(-)); 
return The Fused BBA MFusion(-) 
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Table I 
SENSOR’S CORRESPONDING BBAS m,(-). 

My m2 m3 M4 m5 m6 
an 0.75 | 0.4 0 0.35 0.5 0.05 

02 0.1 0.2 | 0.9 | 0.15 0.1 0.1 

03 0.05 | 0.1 0.1 | 0.25 0 0 
{01,02} 0 | 03 | 0 | 02 0 0.3 
{01,03} 0 0 0 0 0 0.2 
{02,03} 0 0 0 0 | 0.15 | O1 
{91, 02,03} | O.1 0 0 | 0.05 | 0.25 | 0.25 


of its belief to 62. Accordingly, sensor 3 is highly conflicting 
with the mentioned four sensors (1, 2, 4 and 5). 

According to Eq.(13), we can evaluate each sensor reading 
m, by calculating S;(m,), for alli = 1,---,6 and j =1,---,4. 
All results of S$; (m,;) are listed in the scoring matrix (Table 
II). 


Table II 
SCORING MATRIX Sj (mj). 


Sj (m;) My m2 M3 M4 m5 m6 
St(-) 0.6771 | 0.9591 | 0.4690 | 1.1508 | 0.6959 | 0.4721 
I(-) 0.1585 | 0.3000 0 0.2792 | 0.5462 | 0.9962 

Conf(-) | 0.8813 | 0.6374 | 1.2641 | 0.7562 | 0.5317 | 0.3225 

dB I) 0.3785 | 0.2586 | 0.6115 | 0.2682 | 0.2622 | 0.3127 


As we can see in Table II, the sorted rankings with the 
involved criteria are quite different: 


St(-):m3 > mg > mM, > M5 + M2 > Ma, 

I(-):m3 > m,> m4 > M2 > Ms > Me, 

Conf (-): mg > m5 > mz > M4 > mM, > Ms, 
‘) 


7m. > M5 >—-M,>- Me My > M3. 


This phenomenon indicates that it is not appropriate to evaluate 
the involved BBAs based on single criterion and a robust 
multi-criteria strategy is necessary. For this, we first calculate 
the positive and negative evidence supports of all sensor 
readings which are illustrated in Table III and Table IV. 


Table III 
EVIDENTIAL SUPPORTS Sup; (mi). 

Supima) | St) | 10) | ConfQ) | 4BIQ) 

mi 0.7746 | 1.4877 0.3828 0.2330 

m2 0.1917 | 0.9425 0.9893 0.5404 

m3 1.6101 | 2.2802 0 0 

ma 0 1.0047 0.6330 0.4980 

ms 0.7180 | 0.4500 1.4123 0.5221 

me 1.5944 0 2.4581 0.3646 


In Table III Sup; (m,) is called the positive support of m,; 
according to criterion C’;. Because the four involved criteria 
we mentioned before are such that the smaller the value, the 
better the BoE. Thus, if the value of the BoE m;, in Table II is 
small according to C; (such as m3 in St(-), the support value 
Supi(ms3) will be the largest according to St(-) in Table IIL. 


Table IV 
EVIDENTIAL SUPPORTS Inf; (mj). 


Infj(mi) | Sé(-) TC) | Conf() | deI() 
m4 0.4131 | -0.1585 | -1.2774 | -0.4123 
me -1.5223 | -0.4623 | -0.4206 0 
m3 0 0 -3.1914 | -1.5773 
ma -2.4897 | -0.4000 | -0.7769 | -0.0157 
ms -0.4696 | -1.4472 | -0.2092 | 0.0036 
m6 -0.0031 | -3.6972 0 -0.1492 


According to Eq.(15)-Eq.(16), the specific supporting BBAs 
ms(-) of all involved sensors are given in Fig. 2. 

In Fig. 2 (a) and (b), we can find that m3 (orange plot) re- 
ceives the fully support belief (because m,(m3) = 1) from the 
perspective of the imprecision criteria (St(-) and I(-))). This is 
because m3 is the only Bayesian BBAs. On the contrary, since 
mg is highly conflicting with other BBAs, the support against 
m3 in Fig. 2 (c) and (d) is largest (because m,(7m3) = 1) 
according to the conflict measures (Con f(-) and d%,(-)). The 
supporting degrees of m3 under different criteria are totally 
inconsistent, which directly indicates that it is difficult to 
evaluate the reliability of sensor readings comprehensively 
depending on a single criterion. In our proposed discounting 
combination method, the reliability factors are given by multi- 
criteria based on BF-TOPSIS. 

According to Eqs. (22)-(25), the corresponding weights of 
all involved criteria are: 


u(St) = 0.2581, 
v(I) = 0.1519, 
u(Conf) = 0.2581, 
v(d§,) = 0.1519. 


Then, all weights of the involved six sensor readings are 
given in Table V based on Eq. (28). 


Table V 
WEIGHTS OF ALL SENSOR READINGS w(m;). 

dis’es*(m;) | dis®°7s*(m;) | w(mi) 
M41 2.8450 3.8938 0.5778 
m2 2.6323 4.0353 0.6838 
m3 2.8284 2.8284 0.5000 
ma 3.1726 3.3888 0.5165 
ms 2.2802 4.2988 0.6534 
m6 1.8371 3.9732 0.6052 


As we can see in Table V, mg receives the highest degree 
of reliability and the highly conflicting BBA m3 gets the 
lowest reliability degree. It is worth mentioning that the four 
weighting values of the involved criteria are automatically 
calculated based on the input sensor readings. To make a direct 
comparison between manual adjustment of criteria weights 
(subjective) and objective weighting used in BF-TOPSIS, we 
first set the weights of St(-) and J(-) to 1 and set the weights 
of Conf(-) and d%,(-) to 0 and we can observe that in 
this case the weight of mgs increases to | in Fig.3 (a) and 
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Supporting BBA Construction for All Input BBAs According to different Criterions 
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Figure 2. Construction of BBAs based on four distinct criteria. (a) FEs in supporting BBA. (b) FEs in supporting BBA. (c) FEs in supporting BBA. (d) FEs 
in supporting BBA. 


Reliability Degree of Inout Sensor Readings. 
T T T T T T 


Input Sensor Readings 
Combination Results. 


T 


S 
fo) 
= 


ents Subjective Determination 1: 
le 

£ (v(C,)=v(C,J=1;0(C,)=v(C,)=0). 
a 0.4 Subjective Determination 1: ‘| 
Ss (o(CJ=v(C)=0;0(C,)=v(C,)=1). 
3s 0.2 —€— Objective Weight Determination 
7) 
“ 0 

6, 5 95 0,U9, 0,U9, 9,U 9, 0,U4,U 95 


Focal Elements Considered in Sensor Readings 


Figure 3. Comparison of subjective and objective weights of criteria for BF-TOPSIS. 
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accordingly, the mass of 62 becomes the largest in Fig.3 (b). 
When we modify subjectively the corresponding weights of 
different criteria, the corresponding weights of all BoEs and 
the masses of all FEs will also change. Thus, the more criteria 
are considered, the more weights need to be set, which makes 
subjective determination of criteria weighting factors more 
difficult in BF-TOPSIS. In this new method, we do not set 
the weights of all involved criteria in advance and we can see 
in Fig.3 (a) that the weight of m3 is calculated appropriately 
(red plot) and the final decision is 0;. Based on the obtained 
weights calculated by this new method, we can use PCR6 rule 
to fuse all discounted sensor readings. Fig. 3 (b) illustrates the 
corresponding belief mass of the considered FE. 


B. Fusion of High Conflicting Sensor Readings 

Here we show that our multi-criteria discounting combi- 
nation method can fuse high conflicting sensor readings and 
give reasonable belief mass distribution. We herein consider 
two independent BBAs m, and mz which are defined over the 
identical FoD ©O = 6;, 02,63. And we also assume that these 
two BBAs are in highly conflict which are given as follows: 


de = 0.99, m1 ({3}) = 0.01, a 


In order to be able to see whether our approach can handle 
highly conflicting problems and how it differs from other 
classical approaches, we summarize the combination results 
in Table VI. 

We can note that the Dempster’s rule directly leads to 
counter-intuitive results because 63 receives the total mass 
of belief by using this classical rule. Martin’s approach and 
Jiang’s approach assign most values of belief to FE {0}, 62, 43} 
and after probabilitic transformation (BetP(-)), these two 
methods can give the final decision: 6;. Frikha’s approaches in 
[26], [27] and our approach can directly draw the conclusion 
that 0; is the final decision based on the principle of maximum 
probability. In addition, we also give a comparison of the 
computational time of each mentioned approach. Because the 
multi-criteria methods need to process more fusion steps, it 
takes more time than the classical method (Dempster’s rule, 
Martin’s method and Jiang’s method). However, in this paper, 
those methods with lower computational complexity (such 
as Dempster’s rule) are not the optimal choice because the 
criterion for judging the pros and cons of the fusion algorithms 
here is to give reasonable and correct decision results in face 
of highly conflicting fusion problems. 


C. Combination of Conflict and Imperfect BBAs 


Assume that three sensors 1, 2 and 3 providing three BBAs 
m1,™Mz and m3 defined over the same FoD 0 = {61, 02, 03} 
in Table VII. 

AS we can see, the most supported element is 6, by sensor 
1. Contrary to sensor 1, it is @2 in sensors 2 and 3. 

From Table VIII, we can find that the weight of sensor | is 
the lowest (w(m 1) = 0.3109 and w(me2) = 0.6834, w(m3) = 
0.6578) which means that m;(-) is the most imprecision and 


also in conflict with mz and ms. This result is in some degree 
consistent with Frinkha’s method [26], which also agrees that 
sensor ™m, is much more less important than mz and m3. So, 
based on the results of the obtained weights in Table VIII, 
we can say that sensor | should not play an important rule 
in the combination result which means that m, needs to be 
discounted. 

From the given three BBAs, we can find that m gives 
the most mass of belief to FE @,;. At the same time, mz 
and m3 considerably support 62. However, m2 and m3 are 
not in conflict with m, because these two BBAs also assign 
appropriate masses to 6,. When we combine these three BBAs 
with Dempster’s rule, Jiangés approach or Martin’s approach, 
we can see that the final combination results of these methods 
are almost the same in Table IX (the final decision is 6;). 
This is because that only one conflict criterion is applied 
for weight calculation in Martin’s method or Jiang’s method. 
As we can see in Table VIII, conflict measure could not 
capture the difference between m, and m2, m3 which leads 
to almost same weights of these three BBAs. However, in 
our proposed method based on BF-TOPSIS, multi-criteria 
(imprecision and conflict measure) are both cooperated into 
the weight calculation which can give more comprehensive 
evaluations. In Table VIII, m, receive the lowest weight 
compared to mz and m3 which means that m2 and m3 have 
the most important effect on the final combination result. In 
Table IX, we can see that our proposed method supports 62 
as the final decision and this conclusion is consistent with the 
other multi-criteria approaches (PROMETHEE II and AHP). 


D. Random Runs Generated by Monte Carlo Simulations 


In this part, we make 50 Monte Carlo simulations and in 
each simulation, we generate 25 random BBAs over the identi- 
cal FoD © = {01, 62,03}. The first six sensors are in favour of 
{6}, the next eight ones are in favour of {63}, whereas the last 
eleven sensors are again in favour of {6;}. A mass function 
focused on {6;} has four focal elements: {01}, {02}, {03} and 
{61}, {02} with m({01}) = 0.45 + 2; m({62}) = 0.15 — y; 
m({@3}) = 0.15 — x; and m({01}, {@2}) = 0.25 + y. The 
values x and y are randomly generated in the intervals of [0.01, 
0.15] and [0.01, 0.10] according the uniform distribution, 
respectively. Besides, the average values of the weights w of 
the involved sensors and the pignistic probability Bet P(@) are 
generated in the 50 Monte Carlo simulations. 

Due to the fact that the weight of one BoE is independent of 
the other BoEs according to the class of imperfection criteria, 
in this case, the evolution of the weight of each sensor when 
we sequentially add BBAs is definitely affected by conflict 
criteria. 

Figures 4—6 indicate the average of reliability degrees for 
each group of BoEs (first six BoEs, next eight BoEs, the last 11 
BoEs). In each step, one BoE is added, and the discounting 
factors are calculated for all BoEs. The discounting factors 
are obtained by averaging the 50 Monte-Carlo simulations. 
In Fig.4, we can observe that the weights of sensors (1-6) 
decrease as soon as the next eight BoEs are added. This 


388 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Table VI 


COMBINATION OF HIGHLY CONFLICTING BBAS. 


{61} {62} {03} ] {01,02} | {01,03} | {02,03} | {01,02,03} | Decision | Computational time 
Dempster’s rule 0 0 1.0000 0 0 0 0 03 35.972ms 
Martin’s method [25] 0.239 0.217 0.027 0 0 0 0.517 04 49.421ms 
Jiang’s method [37] 0.049 0.044 0.005 0 0 0 0.902 04 49.728ms 
Frikhas’s approach [26] 0.984 0 0.016 0 0 0 0 01 81.139ms 
Frikhas’s approach [27] | 0.9262 | 0.0738 0 0 0 0 0 04 73.233ms 
Our proposed method | 0.5365 | 0.2089 | 0.0186 0 0 0 0.2360 04 72.23ms 
Table VII 
SENSOR’S CORRESPONDING BBA’S m;,(-). 
mi | m2 | m3 
an 0.38 | 0.3 | 0.28 
02 0.15 | 0.4 | 0.42 
03 0.15 0 0 
{01,02} 0.15 | 0.3 0.3 
{01,03} 0.03 0 0 
{62,03} 0.03 0 0 
{01,02,03} | 0.03 0 0 
Table VIII 
RELIABILITY DEGREES. 
w {mi} {m2} {ms} 
Martin’s method [25] 0.961 0.989 0.988 
Jiang’s method [37] 0.822 0.854 0.848 
Frikhas’s approach [26] 0.643 1.0000 0.989 
Frikhas’s approach [27] | 0.8019 | 0.9428 | 0.9407 
Our proposed method 0.3109 | 0.6834 | 0.6578 
Table Ix 
COMBINATION OF m1, m2 AND m3. 
m(-) {01} {02} {03} {01,02} | {601,03} | {02,03} | {01,02,03} | Decision | Computational time 
Dempster’s rule 0.5028 0.4582 0 0.0400 0 0 0 01 48.471ms 
Martin’s method [25] 0.491 0.462 0 0.047 0 0 0 04 56.251ms 
Jiang’s method [37] 0.452 0.438 0.005 0.092 0.002 0.002 0.009 04 56.274ms 
Frikhas’s approach [26] 0.418 0.5000 0 0.082 0 0 0 02 87.380ms 
Frikhas’s approach [27] | 0.3938 | 0.4923 | 0.0039 0.1090 0 0 0 02 75.563ms 
Our proposed method | 0.2285 | 0.3090 | 0.0081 0.2060 0.0012 0.0012 0.2460 02 73.832ms 


phenomenon is mainly due to that these eight sensors (sensor 
7-14) are in conflict with the initial sensor readings. In Fig.5, 
at the beginning stage, there are no BoEs which are in 
conflict with the first six sensors, the weights of sensors 7-14 
are relatively low. After that, their corresponding reliability 
degrees tend to increase because of the increment of those 
sensors which support 63. Finally, these weights of the 7- 
14 sensors decrease because the remaining BoEs (Sensors 
17-25) support 6;. It can be seen that the proposed method 
in this paper can give the corresponding weights of each 
sensor reading effectively when conflicts occur. This feature 
is extremely important in complex dynamic fusion problems. 


We also show the evolution of BetP(6;) for all 1 = 1,2,3 
using our proposed approach and other mentioned methods 
(especially the two multi-criteria methods) in Figures 7-9. As 
shown in Fig. 7, at the beginning, our method supports: 6, 
then, with the increment of the number of conflicting BoEs, 


our approach reacts slowly. However, when receiving many 


conflicting BoEs which support: 63, the value of Bet P(6,) can 
decrease rapidly and then the mass of belief of 03 increases 
when the second group is added in Fig. 9. Besides, we can 
notice that the related reaction of our method when the second 
groups of BoEs are added in Fig. 8. Compared to our proposed 
method, the value of BetP(03) obtained by Dempster’s rule 
n Fig. 9 is always zero which proves that Dempster’s rule 
cannot be applied to fuse efficiently highly conflicting BoEs; 
The changing trends of the mentioned multi-criteria methods 
(Frikha’s two methods in [26] and [27]) can prove that they can 
deal with the conflicting fusion problem to a certain extent, but 
it is not sensitive enough compared with our proposed method 
in this paper. For example, in Fig. 9, with the increment of 
the number of BoEs supporting 03, Bet P(@3) obtained by our 
method is much larger than the values of BetP(@3) derived 
from these two methods [26] and [27]; 
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(a) Average Reliability Degree of the First Six Sensors. P (a) Average BetP(@,). 
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As shown in Fig. 7, the convergence speed of BetP(61) 
in our method is also significantly faster (when the BBAs 
increase from 15th to 25th fusion step). It can be seen from 
the Monte Carlo simulations that the proposed method can 
adapt to the dynamic changes of the environment and give the 
correct decisions. 


VI. CONCLUSION 


In this work, a novel multi-criteria discounting combination 
rule has been proposed and presented. The BF-TOPSIS ap- 
proach with an original objective criteria weighting method 
has been used to evaluate the reliability of the involved sensor 
readings. The procedure of assessing the reliability degree is 
based on two categories of criteria. The first class is the degree 
of imprecision (contradiction and imprecision) and the second 
class is the conflict degree between sensor readings (conflict 
and interval distance). After discounting the original sensor 
reading, all involved BBAs are combined with PCR6 fusion 
tule for decision-making support. 

In order to prove the efficiency of our proposed approach, 
several simulations have been provided to illustrate the ap- 
plicability and efficiency of our method. Also, meaningful 
comparisons were made with other classical approaches. Our 
results and the analysis of the performance obtained show 
that our approach is effective in dealing with conflict issues 
because of multi-criteria strategy adopted. Consequently, this 
approach can help to reduce counter-intuitive behaviors and 
biased readings. 

In future work, we will focus on the impact of each criterion 
in the combination results and more criteria will be taken into 
account. Also, more investigations will be done to explore the 
difference of performances between global fusion with PCR6 
in this paper and sequential fusion with PCR6. We plan to 
analyze the performance of this new approach for real-world 
sceneries using real data sets. 
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Abstract—Image registration is a crucial and fundamental 
problem in image processing and computer vision, which aims 
to align two or more images of the same scene acquired from 
different views or at different times. In image registration, 
since different keypoints (e.g., corners) or similarity measures 
might lead to different registration results, the selection of 
keypoint detection algorithms or similarity measures would bring 
uncertainty. These different keypoint detectors or similarity 
measures have their own pros and cons and can be jointly used to 
expect a better registration result. In this paper, the uncertainty 
caused by the selection of keypoint detector or similarity measure 
is addressed using the theory of belief functions, and image 
information at different levels are jointly used to achieve a more 
accurate image registration. Experimental results and related 
analyses show that our proposed algorithm can achieve more 
precise image registration results compared to several prevailing 
algorithms. 


Keywords: image registration, evidential reasoning, belief 
functions, uncertainty. 


I. INTRODUCTION 


Image registration is a fundamental problem encountered 
in image processing, e.g., image fusion [1] and image change 
detection [2]. It refers to the alignment of two or more images 
of the same scene taken at different time, from different 
sensors, or from different viewpoints. Image registration plays 
an increasingly important role in applications of surveillance 
[3], remote-sensing [4] and medical imaging [5]. 

For a collection of images to be registered, one is chosen 
as the reference image and the others are selected as sensed 
images. Image registration align each sensed image to the ref- 
erence image by finding the correspondence between all pixels 
in the image pair and estimating the spatial transformation 
from the sensed image to the reference image. In this paper, we 
just consider the image registration between two images, i.e., 
there is only one sensed image together with a given reference 
image. 

Current image registration techniques that based on image 
domain can be generally divided into two categories [6]: the 
sparse methods and dense methods. There are also some meth- 
ods based on transform domain, like Fourier-Mellin transfor- 
mation method [7]. The transform domain based methods are 
often used for image registration with similarity transformation 
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model. In this paper, we focus on the image domain based 
methods. 

The sparse methods [8] extracts and matches salient features 
from the reference image and sensed image and then estimates 
the spatial transformation between the two images based on 
these matched features. Line features (e.g., edges) and point 
features (corners, line intersections and gravities of regions) 
all can be used for image registration. Corner features are 
the mostly used features and can be manually selected or 
automatically detected by Harris [9], FAST (Features from 
Accelerated Segment Test) [10], SIFT (Scale-Invariant Feature 
Transform) [11], SURF (Speeded-Up Robust Features) [12], 
DAISY [13], ORB (Oriented FAST and Rotated BRIEF) [14], 
KAZE [15], etc. 

In contrast to the sparse methods, the dense methods [16] 
do not detect features from the image pair but search the 
optimal spatial transformation directly that can best match all 
the pixels in the image pair. Similarity (resp. dissimilarity) 
measures are defined to quantify the independency (resp. 
dependency) between the pair of images. Various similarity 
and dissimilarity measures have been proposed [17] such 
as RMSE (Root-Mean-Squared Error), PSNR (Peak Signal 
to Noise Ratio), Spearman’s Rho [18], NCC (Normalized 
Cross-correlation Coefficient) and MI (Mutual Information). It 
should be noted that dense methods based on RMSE or PSNR 
cannot handle the cases with illumination variation since these 
two similarity/dissimilarity measures are very sensitive to 
illumination changes. 

Both the sparse methods and dense methods involve uncer- 
tainty problems. For the sparse methods, keypoints obtained 
from different keypoint detectors describe different corner 
features of the image. Therefore, image registrations based 
on different keypoint detectors would obtain different spatial 
transformations. For the dense methods, different similarity 
(dissimilarity) measures quantify the difference between the 
pair of images from different aspects so that image regis- 
trations based on different similarity (dissimilarity) measures 
would obtain different spatial transformations. These different 
spatial transformations obtained have their own pros and cons, 
and the selection of the spatial transformation (the selection of 
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the feature detector or similarity measure indeed) would bring 
uncertainty. 

To deal with the uncertainty caused by the particular selec- 
tion of feature detector or similarity (dissimilarity) measure, 
one feasible way is to combine these registration transfor- 
mations obtained from different feature detection methods or 
similarity measures to obtain a better registration result. The 
belief functions introduced in Dempster-Shafer Theory (DST) 
[19] of evidence offer a powerful theoretical tool for uncer- 
tainty modeling and reasoning; therefore, we propose a fusion 
based image registration method using belief functions. In 
this paper, the spatial transformations obtained from different 
feature detection algorithms or similarity measures compose 
the frame of discernment (FOD) and their uncertainties are 
modeled using belief functions. In uncertainty modeling, im- 
age information at different levels, i.e., image’s intensities, 
edges and phase angles, are jointly used to evaluate the beliefs 
about image transformations. Then, these uncertainties are 
further handled through the evidence combination of the above 
multiple information. The final registration result is obtained 
according to the combined evidence. 

This paper is an extension of our previous work in [20] 
where the basic idea is briefly presented. The main added 
values with respect to [20] are as follows. First, the transforma- 
tion model between the reference image and sensed image is 
more comprehensive. We use similarity transformation model 
in [20] but use projective transformation model in this paper, 
which is more general since all similarity transformations are 
examples of projective transformations. Second, the keypoints 
used in the sparse approach in [20] are manually selected. To 
reduce the subjective influence to the registration result, in this 
paper, the keypoints are generated from detection algorithms. 
Accordingly, feature matching and mismatching removal are 
added after the keypoint detection. Third, when modeling 
uncertainties, one more information source, i.e., image’s phase 
angle information, is considered in this work. Fourth, more 
experiments and analyses are provided for performance eval- 
uation and analysis. 

The rest of this paper is organized as follows. The basics of 
image registration are introduced in Section II. The basics of 
evidence theory are introduced in Section III. The proposed 
image registration method is introduced in Section IV with 
emphasis of uncertainty modeling and handling. Evaluation 
method is introduced in Section V. Experiment results of the 
proposed method and other registration methods are presented 
and compared in Section VI. Concluding remarks are given in 
Section VII. 


II. BASICS OF IMAGE REGISTRATION 


For two (or more) images of the same scene taken at differ- 
ent time, from different sensors, or from different viewpoints, 
one is chosen as the reference image (/) and the other one 
is chosen as the sensed image (S). In this paper, we focus 
on the projective transformation model between the reference 
image and sensed image, which is a commonly used model 
in image registration [16]. Denote pixel coordinates in the 


reference image R as (v,w) and their mapping counterparts 
in the sensed image S as (g, h). The projective transformation 
from R to S can be expressed based on the homogeneous 
coordinates (Homogeneous coordinates can easily express 
the translation transformation as matrix multiplications while 
Cartesian coordinates cannot) as 


Ig h 1] = [u WwW 1T = [v W 1] ta1 tea tag (1) 
t31 t32 t33 


The similarity transformation and affine transformation are 
important specializations of the projective transformation, as 
illustrated in Table I. 


Table I 
PROJECTIVE TRANSFORMATION AND ITS TWO SPECIALIZATIONS. 


Similarity 


scos@ ssin@d 0 


—ssin@ scosd 0 


The purpose of image registration is to estimate the trans- 
formation T to align the sensed image S' with the reference 
image R by 

[v’ w’ 1} = [9 h JT, (2) 


where (v’, w’) and (g, h) denote pixel coordinates in registered 
sensed image S’ and sensed image S, respectively. Current im- 
age registration techniques can be divided into two categories 
[6] in general, including the sparse method and dense method. 
Basics of these two methods are introduced below. 


A. Sparse Image Registration and Its Uncertainty 


The feature detection and feature matching are two critical 
steps in the sparse methods. The flow chart of the sparse 
approach is illustrated in Figure 1, where each functional block 
is detailed in the sequel. 


1) Feature Detection: Corner features are the mostly used 
features in image registration due to their invariance to imag- 
ing geometry [6]. Some early keypoint detectors, like Harris 
and FAST, are very sensitive to image scale changes so 
that have poor performance when the sensed images have 
different scales with the reference image. The most well- 
known SIFT detector shows good robustness to illumination, 
orientation and scale changes. Most scale invariant detectors, 
like SIFT, SURF, ORB and BRISK, detect and describe 
features at different scale levels by building or approximating 
the Gaussian scale space of the image. In a different way, 
KAZE detects features in a nonlinear scale space built using 
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Feature detection 


Feature detection 


Reference image R 


Estimated Registered 
Feature ; 8 
: transformation sensed 
matching F 
a image §’ 


Figure 1. Flow chart of sparse approach. 


(b) 


Figure 2. Different keypoint pairs detected by different keypoint detectors. (a) BRISK, (b) KAZE, (c) SURF. 


efficient additive operator splitting techniques and variable 
conductance diffusion. 


2) Feature Matching: To align the sensed image and the 
reference image, the detected keypoints in the two images are 
matched first by comparing their local feature characterized 
by descriptors. Generally, if the two keypoints’ descriptors 
are similar, the two keypoints are likely to be a matched 
pair. Given a keypoint ¢ in the reference image, there might 
be a set of candidates in the sensed image having similar 
descriptor with t. Among these candidates, t’s real counterpart 
should have the closest distance with t, and at the same time 
its distance should be much closer than other candidates’ 
distances. 


The accuracy of the keypoints’ matching affects the ac- 
curacy of the transformation’s estimation. The mismatched 
keypoint pairs should be further removed before estimating 
the transformation. RANSAC (RANdom SAmple Consensus) 
[21] and MSAC (M-estimator SAmple and Consensus) [22] 
are often used to deal with this problem. A recent RANRESAC 
(RANdom RESAmple Consensus) [23] algorithm has been 
proposed to remove mismatched keypoint pairs for noisy 
image registration. Besides the accuracy of the keypoints’ 
matching, the distribution of matched pairs over the image 
space is another key factor to obtain a high-quality estimation 
of transformation. 


3) Transformation Estimation: With all the matched key- 
point pairs, the transformation matrix T' can be estimated using 
Eq. (1). Since T has eight degrees of freedom, four point 
correspondences (with no three collinear) are needed to obtain 
the unique solution of T’ according to Cramer’s rule. 

Normally, the amount of the matched keypoint pairs is more 
than four and T' can be estimated using the least squares (LS) 
fitting technique [6] by searching the minimum sum of the 
Euclidean distances between all the matched keypoints: 


T = argmin > d(cor®, cor® ), (3) 

7 
where cor’ = (v;,w;) represents the coordinate of the i-th 
matched keypoint in the reference image and cor? = (v}, w') 
represents the coordinate of the 7-th matched keypoint in the 
registered sensed image transformed from the sensed image 


using Eq. (2). 


4) Uncertainty Encountered in Sparse Approach: Since 
different keypoint detection algorithms detect different kinds 
of corner features, the detected keypoints are usually different, 
as shown in Figure 2. 

Image registrations based on different matched keypoint 
pairs would in general yield different spatial transformations 
to align two images. Different transformations obtained have 
their own pros and cons. Therefore, the selection of keypoint 
detection algorithms would bring uncertainty problem to the 
registration results. 
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Figure 3. Flow chart of dense approach. 


B. Dense Image Registration and Its Uncertainty 


The dense image registration estimates the optimal transfor- 
mation T’ by searching the largest similarity (or the smallest 
dissimilarity) between the reference image F and the regis- 
tered sensed image S’ = T(S): 


T = argmin Sim(R,T(S)), (4) 


where Sim is a chosen similarity measure. 

The flow chart of the dense approach is illustrated in Figure 
3, where each functional block is detailed in the sequel. 

1) Similarity Measure: Various similarity (or dissimilarity) 
measures have been proposed. Here we briefly introduce the 
commonly used MI, NCC and PSNR measures. 

1) MI (Mutual Information) measure: 

The MI measure between images A and B is defined by 


255 255 (a ,b) 
MI(A, B) = D7 pan (a,b) log 7 PAB (3) 
ae A(a)pa(b) 


where pp is the joint probability distribution function 
(PDF) of images A and B, and py and pg are the 
marginal PDFs of A and B, respectively. MI(A, B) is 
larger when A and B are more similar. 


2) NCC (Normalized Cross-Correlation) measure: 

For given images A and B with size of M x N, NCC 

measure between them is 
M N 


->yS 


x=ly=1 


Ha) B(x, y) = 
OAOB 


NCC(A, B) HB) 


(6) 
where A(x, y) and B(x, y) are the pixels’ intensities in 
images A and B at (x,y), respectively; 444 and pp are 
the mean intensities of A and B, respectively; 74 and 
op are the standard deviation intensities of A and B, 
respectively. NCC(A, B) is larger when A and B are 
more similar. 


3 


wm 


PSNR (Peak Signal-to-Noise Ratio) measure: 
PSNR measure between images A and B is defined by 


2567 
MSE(A, B)’ 
where MSE(A,B) = yhy DM, SMA y) - Bla, y)). 
PSNR(A, B) is larger when A and B are more similar. 
Since PSNR measure is very sensitive to illumination 
changes, it cannot be used for image registration when 
there are illumination variations between image pairs. 


PSNR(A, B) = 10 x logyo( (7) 


9 


2) Transformation Estimation: The estimation for transfor- 
mation T’, i.e., Eq. (4), is always a non-convex problem and is 
not so easy to obtain the global maximum [24]. Therefore, ad- 
vanced optimization methods [25], or intelligent optimization 
approaches (like genetic, or particle swarm algorithms, etc.) 
are often used to estimate the optimal transformation T’. 


3) Uncertainty Encountered in Dense Approach: Since dif- 
ferent similarity (dissimilarity) measures compare two images 
from different aspects, their calculated similarities (dissimilar- 
ities) between the reference image and registered sensed image 
are different. Image registration based on different measures 
would obtain different spatial transformations to align two 
images and they have their own pros and cons. Therefore, 
the selection of similarity (dissimilarity) measure would bring 
uncertainty problem to the registration results. 

To deal with the uncertainty caused by the selection 
of feature detection algorithms or similarity measures, one 
feasible way is to combine the registration transformations 
(71, T>,...,Tq@) obtained from different feature detection al- 
gorithms (or different similarity measures) to expect a better 
registration result. We propose an evidential reasoning [19] 
based image registration algorithm to generate a combined 
transformation from 7), T>, ..., T@ thanks to the ability 
of belief functions for uncertainty modeling and reasoning. 
Basics of the theory of belief functions are recalled first below. 


III. BASICS OF DEMPSTER-SHAFER EVIDENCE THEORY 


Dempster-Shafer evidence theory (DST) [19] is a the- 
oretical framework for uncertainty modeling and reason- 
ing. In DST, elements in the frame of discernment (FOD) 
O = {01,62,...,9q} are mutually exclusive and exhaustive. 
The power set of O, Le., 2° is the set of all subsets of O. 
For example, if 0 = {61,02,03}, then 


Wie {{O}, {1}, {62}, {93}, {1, Oa}, 
{61, Os}, {02, As}, {01, 02, A3}}. 


The basic belief assignment (BBA, also called mass function) 
is defined by a function m : 2° ++ [0, 1], satisfying 


m(0) = 0, and y m/(A) = 


ACO 


(8) 


where m(A) depicts the evidence support to the proposition 
A. A is called a focal element when m(A) > 0. If there 
is only one element in A, like {0;} and {62}, A is called 
the singleton element; if there are more than one element in 
A, e.g., {01,42} and {01,602,603}, A is called the compound 
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element. The belief assigned to a compound element represents 
the degree of ambiguity for the multiple elements. 

The plausibility function (P/) and belief function (Bel) are 
defined as follows: 


PI(A)= > m(B) (9) 
ANB#O 

Bel(A) = > m(B). (10) 
BCA 


Dempster’s combination rule [19] for combining two dis- 
tinct pieces of evidence is defined as 


0, if A= 9, 

= 1 
(mi ®mz)(A) —= >, m1(B)m2(C) if AAO. 
BNC=A (11) 


Here, K =D) pqcug m1 (B)m2(C) denotes the total con- 
flict or contradictory mass assignments. 

An alternative fusion rule PCR6 [26] for the combination 
of two sources is defined as 


mix "°(A) =miy"(A) 
m(A)?m2(Y) | _me2(A)*mi(¥) 
+ 
2 (A) + meV) * ima(A) + m(¥) 
(12) 
where m{’"!(A) is the conjunctive rule defined as 
myy"(A)= S> mi(B)ma(C). (13) 


BNC=A 


General PCR6 formula for the combination of more than two 
sources is given in [26]. 

For a probabilistic decision-making, Smets defined the pig- 
nistic probability transformation [27] to obtain the probability 
measure BetP from a BBA 

a, 


ACO\O;EA 


i) 
|A| 


where |A| is the cardinality of A. The decision can be made 
by choosing the element in FOD whose BetP value is the 
highest one and higher than a preset threshold. Other types of 
probability transformation methods can be found in [26,28]. 


BetP(0;) = (14) 


IV. IMAGE REGISTRATION BASED ON EVIDENTIAL 


REASONING 


To deal with the uncertainty caused by the choice of 
keypoint detectors in the sparse approach or the choice of 
similarity measure in the dense approach, we propose an image 
registration method based on evidential reasoning. Suppose 
that the spatial transformation between the reference image 
and sensed image is projective. Our purpose is to estimate 
the transformation matrix to align two images. Unlike the 
prevailing methods estimating the transformation matrix from 
single method of keypoint detection or similarity (dissimilar- 
ity) measure, we estimate the transformation matrix by jointly 


utilizing different keypoint detection methods or similarity 
measures. 

To use belief functions for image registration, one should 
define the frame of discernment (FOD) first. The FOD 
O = {01, 62,...,9Q}, where Q is the amount of transforma- 
tions obtained from different single feature detection algo- 
rithms or different single similarity measures. We first model 
the beliefs for every proposition A C © using BBAs. A can 
be single transformation in FOD or a set of transformations 
in FOD. One BBA depicts the support to each proposition 
A from one evidence source. The BBA allocations from 
different evidence sources describes the uncertainty of the 
transformations in FOD. Next, the BBAs are combined to 
generate the combined BBA m, depicting the fused support to 
each proposition A. Then, the combined transformation Ty. is 
generated from the combined BBA m,. Finally, the registered 
sensed image S’. is transformed from the sensed image using 
Eq. (2). During this process, the resampling [29] is needed to 
determine the intensity of each pixel in S”. Figure 4 illustrates 
the flow chart of this new proposed method. It should be noted 
that the classical interpretation of BFT assumes that the final 
estimation should be in the FOD. In this work, we relax this 
assumption and the final transformation is a combination result 

], of those in the FOD. 


A. Uncertainty Modeling 


If the similarity between the reference image F and regis- 
tered sensed image 5" is large, the corresponding transforma- 
tion J; is quite accurate and should be allocated large support 
(S‘ is transformed from sensed image S by T~'). Here we 
use NCC (other similarity or dissimilarity measures, e.g., MI, 
are also appropriate to quantify the similarity here) to measure 
the similarity between R and Si: 


MN 


Lr) (S)(x, y) — bs) 
NCC; = (R(@, 9) = He)(Si(@, 9) = Hay) 15 
=o ieee . 305) 


where fir and psy are the mean intensities of A and Si, 
respectively; or and og: are the standard deviation intensities 
of R and S%, respectively. 

Since multi-source information can help to reduce the uncer- 
tainty through evidence combination, we use different levels 
of image information to quantify the similarity between R and 
S!. The similarity can be calculated from the gray images, edge 
feature images or reconstructed images using phase angle as 
shown in Figure 5. Their corresponding NCC; are denoted 
as NCC;(G), NCC;(£) and NCC;(P), respectively. The edge 
detection method used in Figure 5-b is the Canny detector [30]. 
More details of the image reconstruction from phase angle 
information can be found in [29]. 


The value range of NCC;(-) is [—1,1]. According to our 
experiments, most values of NCC;(-) are larger than 0. Before 
allocating BBAs, we first enlarge the differences of NCC;(-) 
within [0,1] using function y = e*~', as illustrated in Figure 
6. Each level of image information (gray images (G), edge 
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Figure 4. Flow chart of the proposed image registration. 


(b) 


(c) 


Figure 5. Image information at different levels. (a) Gray image. (b) Edge feature image. (c) Reconstructed image using phase angle. 


Figure 6. The curve of function e?~!. 


feature images (£) and reconstructed images using phase 
angle (P)) can be viewed as one evidence source and their 
corresponding y = eN°“()—! can be used to assign beliefs 
for transformation T;: 


NCC; (G)—1 
me(Li) = Sa peer 
oe 
NCC; (E)—1 
mp(Ti) = So NET (16) 
j=l 
NCC; (P)—1 
mp(T;) => cS 


DONT PT 
eee ge 


B. Fusion-Based Registration 


After obtaining BBAs mg, mp and mp, we generate 
the combined BBA m, using a combination rule denoted 


symbolically with @: 


me(:) = [mg © mg © mp|(:). (17) 


m,(T;) describes the combined evidence support to T; (a 
3 x 3 matrix with 6 unknown parameters). The combined 
transformation TJ’. is computed by 


Q 
1 = men (18) 
i=1 
Finally, the registered sensed image S$’ can be obtained 
using Eq. (2) following the resampling. 


V. EVALUATION OF IMAGE REGISTRATION 


Since the purpose of image registration is to align the 
reference image R and sensed image S to a single coordinate 
frame, one popular evaluation method for the registration result 
is to quantify the difference (usually quantified by Root-Mean- 
Squared Error (RMSE)) between F and the registered sensed 
image SY (31,32]. However, since Sr is transformed from 
the sensed image S, which may have less information than 
R (S may be part of R or have lower resolution than R 
since R and S can be taken from different views or taken 
by different cameras), the difference between R and S’ could 
be large even when the estimated transformation T,. equals to 
the true transformation Tj. from the reference image F to the 
sensed image S, as shown in Figure 7. Therefore, this kind of 
evaluation method is not accurate enough. 


398 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Reference image R 


—] 
Tl || 
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Figure 7. Relationship among R, S, R/., and S’. 


Another popular evaluation method is to quantify the differ- 
ence between the reference image R and image R/., which is 
transformed from R by the transformation matrix fe 
[16,33], as shown in Figure 7. The mapping relationship 
between pixel at (v,w) in image R and pixel at (v’,w’) in 
image R’, satisfies 


[v’ w! 1] = [v w 1 TimeTe*, (19) 


when the registration is absolutely accurate, T. = Tire and 
RL=R. 


In this paper, we evaluate the registration performance by 
quantifying the difference between the reference image FR and 
image R’, using AAID (average absolute intensity difference) 
[16]: 


M N 
1 
AAID(R, Ri) = i 2 ) ) |R(z, y) — Ro(a,y)|- (20) 
x=1y=1 


AAID(R, R/.) is smaller when the registration result is better. 


VI. EXPERIMENTS 
To verify the performance of our new proposed image 
registration method, we provide experiments on noise-free 
images and noisy images, respectively. Image registration 


under the noisy condition is difficult since the noise pixels 
bring difficulties for keypoints’ detection and matching and 
reduce the accuracy for similarity measure. For the sparse 
method, experiment results based on BRISK [34], KAZE [15] 
and SURF [12] feature detection algorithms are provided for 
comparison. For the dense method, experiment results based 
on MI, PSNR and NCC similarity measures are provided for 
comparison. For the noisy image registration, the experiment 
result of RANRESAC (a recently proposed method for noisy 
image registration) [23] is also provided for comparison. 


A. Sparse Image Registration Results 


We first do experiments on actual data to illustrate the 
effectiveness of the proposed method. The reference image and 
sensed image are taken from different cameras with different 
views, as shown in Figure 8. BRISK, KAZE and SURF feature 
detections are used for generating transformations 7), TJ and 
T3, respectively. When deriving combined BBAs in Eq. (17), 
an alternative fusion rule PCR6, which is more robust than 
Dempster’s rule [26], is also used for comparison. 

The registered results of the proposed method are illustrated 
in Figure 9. From Figure 9, the proposed method can success- 
fully align the sensed image to the reference image illustrating 
that the proposed method is effective for actual data. 
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(b) 


Figure 8. Fence image pair. (a) Reference image. (b) Sensed image. 


(b) 


Figure 9. Registered results of the proposed methods for Fence image. (a) Dempster’s rule. (b) PCR6. 


(a) (b) () (d) 


Figure 10. Boats image pair and Foosball image pair. (a) Boats reference image. (b) Boats sensed image. (c) Foosball reference image. (d) Foosball sensed 
image. 
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Figure 11. AAID evaluations of registration results for Boats image pair and Foosball image pair. (a) Boats. (b) Foosball. 
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Figure 12. Spatial partition of the AAID evaluation for Boats image. (a) Partition method. (b) BRISK. (c) KAZE. (d) SURF. (e) Proposed (Demp). (f) 
Proposed (PCR6). 
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Figure 13. Spatial partition of the AAID evaluation for Foosball image. (a) BRISK. (b) KAZE. (c) SURF. (d) Proposed (Demp). (e) Proposed (PCR6). 
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To quantify the accuracy of the registration results, the 
actual transformation between the reference image and sensed 
image is needed and we do experiments on simulated images. 
We first do experiments on Boats image (The reference image 
can be found at https://imagej.nih.gov/ij/images/boats. gif.) and 
Foosball image (sample image from the MATLAB), as shown 
in Figure 10. 


The AAID evaluations of these registration results for Boats 
image and Foosball image are compared in Figure 11, where 
Demp represents the Dempster’s combination rule. According 
to Figure 11, the proposed fusion-based method achieves much 
better registration result (smaller AAID) than algorithms based 
on BRISK, KAZE or SURF feature detections, respectively. 

Furthermore, we also analyzed the spatial partition of the 
AAID evaluation for each result by evenly dividing the ref- 
erence image into 5 x 5 parts (as shown in Figure 12a) and 
calculating the AAID between the reference image and the 
registration result in each part. The AAID spatial partition 
results for Boats image and Foosball image are illustrated in 
Figures 12 and 13, respectively. For Boats image, the AAID 
of BRISK and KAZE results varies significantly for different 
parts while the SURF result is relatively uniform; the proposed 
methods have low and similar AAID in most parts while the 
rightmost parts (parts 5, 10, 15, 20 and 25) have significant 
larger AAID. For Foosball image, the AAID spatial partition 
of all these results are uneven. 

Then, we consider the noisy image registration and do 
experiments on West Concord image pair (sample image from 
the MATLAB) with zero-mean Gaussian noise (variance is 
0.01), as shown in Figure 14. The AAID evaluations for 
these registration results are compared in Figure 15, where 
the proposed fusion-based methods achieve better performance 
(smaller AAID) than RANTESAC and methods based on 
BRISK, KAZE and SURF feature detections, respectively. The 
spatial partition of the AAID evaluation for each result is 
illustrated in Figure 16, where the KAZE result is the most 
uneven one. 


B. Dense Image Registration Results 


Since the optimization of dense registration is intractable 
when the solution space has high dimensions, we simplify 
the transformation model to rigid transformation here. The 
solution space for rigid model only has three dimensions: one 
for rotation and two for translations in horizontal and vertical 
directions, respectively. We first provide experiments on Con- 
cord image and Hestain image (sample images from the MAT- 
LAB) as shown in Figure 17, where the sensed image is trans- 
formed from the reference image through the rotation (9 = 10° 
in anti-clockwise) and translation ((t,, t,.) = (—10, 5)) succes- 
sively. 

In the proposed dense approach, MI, PSNR and NCC 
similarity measures are used for generating transformations 
T,, Tz and T3, respectively. The AAID evaluations of these 
registration results for the Concord image and Hestain image 
are compared in Figure 18, where the proposed fusion-based 


(b) 


Figure 14. West Concord image pair. (a) Reference image. (b) Sensed image. 
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Figure 15. AAID evaluations of registration results for West Concord image 
pair. 


methods achieve much better registration results (smaller 
AAID) than algorithms based on MI, PSNR or NCC simi- 
larities, respectively. The AAID spatial partition results for 
Concord image and Hestain image are illustrated in Figures 19 
and 20, respectively. For these two images, the AAID results 
of the proposed methods are smaller in the downside parts 
compared with those in upside parts. 
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(d) 


Figure 16. Spatial partition of the AAID evaluation for West Concord image. (a) BRISK. (b) KAZE. (c) SURF. (d) RANRESAC. (e) Proposed (Demp). (f) 


Proposed (PCR6). 
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Figure 17. Concord image pair and Hestain image pair. (a) Concord reference image. (b) Concord sensed image. (c) Hestain reference image. (d) Hestain 


sensed image. 
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Figure 18. AAID evaluations of registration results for Concord image pair and Hestain image pair. (a) Concord. (b) Hestain. 


Then, we consider the noisy image condition and implement 
experiments on Lifting Body image pair (sample image from 
the MATLAB) with zero-mean Gaussian noise (variance is 
0.01), as shown in Figure 21. The sensed image is transformed 
from the reference image through the rotation (9 = —10°) and 
translation ((ty, tn) = (—10,5)), successively. 


The spatial partition of the AAID evaluation for each result 
is illustrated in Figure 22, and the AAID evaluations for 
these registration results are compared in Figure 23. From 
these two figures, the proposed fusion-based methods achieve 
better performance and the rightmost parts (parts 5, 10, 15, 
20 and 25) have larger AAID than other parts. 


According to all the experiments, the proposed fusion- 
based methods achieve better registration results than those 
prevailing ones (BRISK, KAZE, SURF, MI, PSNR and NCC). 
For noisy image registration, the proposed methods also obtain 
better performance than RANRESAC. This indicates that the 
theory of belief function can well deal with the uncertainty 
brought by the selection of keypoint detection algorithms 
or similarity measures, and the jointly use of the different 
keypoint detections or similarity measures is effective. Fur- 
thermore, from the above provided experiments one sees that 
the choice of combination rule does not affect the registration 
performance that much. 
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Figure 20. Spatial partition of the AAID evaluation for Hestain image. (a) MI. (b) PSNR. (c) NCC. (d) Proposed (Demp). (e) Proposed (PCR6). 


Figure 21. Lifting Body image pair. (a) Reference image. (b) Sensed image. 


C. Computational Cost Table II 
AVERAGE EXECUTION TIME (IN 5) COMPARISON FOR SPARSE 

The computational cost is an important criterion to evaluate ALGORITHMS. 
an algorithm. We counted the computational costs of the above Tica Mee es No 
sparse algorithms and dense algorithms for Cameraman image ——— BRISK. 0.2847. 
(Figure 5-a) on a Windows 10 Enterprise system equipped eee ae es 

‘ ‘ URF 0431 .0437 
with Intel Core i7-7700HQ CPU at 2.80 GHz and 16.00 RANRESAC a Ba6iG 
GB RAM. The platform is MATLAB R2018a. The average Proposed (Demp) 0.3934 0.3933 
execution time comparisons for the sparse algorithms and Proposed (PCR6) 0.3938 0.3989 


dense algorithms are provided in Tables II and III, respectively. 
Each average execution time is calculated from 100 runs of 
experiments. 
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(d) 


Figure 22. Spatial partition of the AAID evaluation for Lifting Body image. (a) MI. (b) PSNR. (c) NCC. (d) RANRESAC-. (e) Proposed (Demp). (f) Proposed 


(PCR6). 
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Figure 23. AAID evaluations of registration results for Lifting Body image 
pair. 


Table HI 
AVERAGE EXECUTION TIME (IN S) COMPARISON FOR DENSE 
ALGORITHMS. 

Method Noise-Free Images Noisy Images 
MI 16.7622 16.4789 
PSNR 12.8583 13.4214 
NCC 14.7187 15.1050 
Proposed (Demp) 17.0734 16.7945 
Proposed (PCR6) 17.0812 16.8729 


From Tables II and III, the dense algorithms need more exe- 
cution time than the sparse algorithms. Furthermore, since the 
proposed fusion-based method combines the registration trans- 
formations generated from the three sparse methods (BRISK, 
KAZE and SURF) or the three dense methods (PSNR, MI and 
NCC) and these three methods can be executed in parallel, 
and the execution time of the proposed fusion-based method 
is longer than the most time-consuming one among the three 
methods. 


D. Discussion of BBA Generation 

The BBA generated in Eq. (16) is Bayesian BBA, where all 
its focal elements are singletons. People in the community of 
belief function theory may prefer to use the compound focal 
elements, which usually seems better than only using single- 


tons in Bayesian BBAs. We have also designed experiments 
of generating non-Bayesian BBAs for image registration using 
FCOWA-ER (Fuzzy-Cautious Ordered Weighted Averaging 
with Evidential Reasoning) [35] method. In detail, when 
multiple image information (image’s intensities, edges and 
phase angle) are simultaneously considered, image registration 
can be viewed as a multi-criteria decision making problem. 
FCOWA-ER (Fuzzy-Cautious Ordered Weighted Averaging 
with Evidential Reasoning) [35] is a decision making approach 
under multi-criteria with uncertainty and it generates non- 
Bayesian BBAs using a-cut method (The a-cut method used 
in FCOWA-ER boils down to the Dubois and Prade allocation 
[36] in this case) when modeling uncertainties. According 
to the experimental results, non-Bayesian BBAs obtain simi- 
lar registration results with Bayesian BBAs. Since Bayesian 
BBAs are easier to generate than non-Bayesian BBAs, we 
recommend Bayesian BBAs for image registration and do not 
provide the non-Bayesian BBA based method in this work. 


VII. CONCLUSION 


In this paper, we proposed a new image registration al- 
gorithm based on evidential reasoning. The uncertainty en- 
countered in image registration is taken into account and 
modeled by belief functions. Image information at different 
levels are jointly used to achieve a more effective registration. 
Experimental results show that the proposed algorithm can 
improve the precision of image registration. 

The generation of BBA is crucial in evidential reasoning 
and most methods are proposed based on applications. In 
this paper, we generate BBAs from three different image 
information, i.e., intensity, edge and phase angle. In future 
work, other image information, such as texture feature and 
gradient feature, will also be considered and jointly used in 
image registration. Furthermore, we will attempt to apply 
the proposed method to color image registration. Different 
color channels of the color image provide different image 
information and can be jointly used in image registration. 
We will also focus on the comparison with the state-of-the-art 
approaches based on convolutional neural networks (CNN). 
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Abstract—In this paper we present two applications of a 
new Belief Function-based Inter-Criteria Analysis (BF-ICrA) ap- 
proach for the assessment of the degree of redundancy of criteria 
involved in multi-criteria decision-making (MCDM) problems. 
This BF-ICrA method allows to simplify the original MCDM 
problem by suppressing redundant criteria (if any) and thus 
diminish the complexity of MCDM problem. This approach is 
appealing for solving large MCDM problems whose solution 
requires the fusion of many belief functions. We show how this 
approach can be used in two distinct fields of applications: The 
GPS Surveying Problem, and the car selection problem. 


Keywords: Inter-Criteria Analysis, ICrA-BF, MultiCriteria 
Decision Making, MCDM, belief functions. 


I. INTRODUCTION 


In a Multi-Criteria Decision-Making (MCDM) problem 
we consider a set of alternatives (or objects) A 
{Aj, Ao,..., An} (MM > 2), and a set of criteria C 
{Cy,C2,...,Cn} (N > 1). We search for the best alternative 
A* given the available information expressed by a M x N 
score matrix (also called benefit or payoff matrix) S = [S;; = 
C;(A;)], and (eventually) the importance factor w; € [0,1] of 
each criterion C; with ae w; = 1. The set of normalized 
weighting factors is denoted by w = {wy,we,..., wy}. 
Depending on the context of the MCDM problem, the score 
Si; of each alternative A; with respect to each criteria C; 
can be interpreted either as a cost (i.e. an expense), or as 
a reward (i.e. a benefit). By convention and without loss of 
generality’ we will always interpret the score as a reward 
having monotonically increasing preference. Thus, the best 
alternative Aj for a given criteria C; will be the one providing 
the highest reward/benefit. 

The MCDM problem is not easy to solve because the 
scores are usually expressed in different (physical) units and 
different scales. This necessitates a choice of score/data nor- 
malization yielding rank reversal problems [1], [2]. Usually 
there is no same best alternative choice A* for all criteria, 
so a compromise must be established to provide a reasonable 


|> ||> 


‘because it suffices to multiply the scores values by —1 to reverse the 
preference ordering. 
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and acceptable solution of the MCDM problem for decision- 
making support. 

Many MCDM methods exist, see references in [3]. Most 
popular methods are AHP? [4], ELECTRE®? [5], TOPSIS* [6], 
[7]. In 2016 and 2017, we did develop BF-TOPSIS methods 
[3], [8] based on Belief Functions (BF) to improve the original 
TOPSIS approach to avoid data normalization and to deal 
also with imprecise score values as well. It appears however 
that the complexity of these new BF-TOPSIS methods can 
become a bottleneck for their use in large MCDM problems 
because of the fusion step of basic belief assignments required 
for the implementation of the BF-TOPSIS. That is why a 
simplification of the MCDM problem (if possible) is very 
welcome in order to save computational time and resources. 
This is the motivation of the present work. 

For this aim we propose a new Inter-Criteria Analysis 
(ICrA) based on belief functions for identifying and estimating 
the possible degree of agreement (i.e. redundancy) between 
some criteria driven from the data (score values). This permits 
to remove all redundant criteria of the original MCDM prob- 
lem and thus solving a simplified (almost) equivalent MCDM 
problem faster and at lower computational cost. ICrA has 
been developed originally by Atanassov et al. [9]-[11] based 
on Intuitionistic Fuzzy Sets [12], and it has been applied in 
different fields like medicine [13]-[15], optimization [16]- 
[20], workforce planning [21], competitiveness analysis [22], 
radar detection [23], ranking [24]-[27], etc. In this paper we 
improve ICrA approach thanks to belief functions introduced 
by Shafer in [28] from original Dempster’s works [29]. We 
will refer it as BF-ICrA method in the sequel. 

After a short presentation of basics of belief functions in 
section II, we present Atanassov’s ICrA method in section 
III and discuss its limitations. In Section IV we present the 
new BF-ICrA approach based on a new construction of Basic 
Belief Assignment (BBA) matrix from the score matrix and 
a new establishment of Inter-Criteria belief matrix. In section 


? Analytic Hierarchy Process 
3ELimination Et Choix Taduisant la REalité 
‘Technique for Order Preference by Similarity to Ideal Solution 
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V a method of simplification of MCDM using BF-ICrA is 
proposed. Two distinct applications of BF-ICrA are presented 
in VI with concluding remarks in Section VII. 


II. BASICS OF THE THEORY OF BELIEF FUNCTIONS 


To follow classical notations of the theory of belief func- 
tions, also called Dempster-Shafer Theory (DST) [28], we 
assume that the answer (i.e. the solution, or the decision to 
take) of the problem under concern belongs to a known finite 
discrete frame of discernement (FoD) 0 = {61,02,...,An}, 
with n > 1, and where all elements of © are exclusive. The 
set of all subsets of © (including empty set @ and ©) is the 
power-set of © denoted by 2°. A BBA (or mass function) 
associated with a given source of evidence is defined [28] as 
the mapping m/(-) : 2° — [0,1] satisfying m(@) = 0 and 
do ac2e M(A) = 1. The quantity m(A) is called the mass of 
A committed by the source of evidence. Belief and plausibility 
functions are usually interpreted respectively as lower and 
upper bounds of unknown (possibly subjective) probability 
measure [29]. They are defined by? 


> 


BCA,BEe22 


If m(A) > 0, A is called a focal element of m/(-). When all fo- 
cal elements are singletons then m/(-) is called a Bayesian BBA 
[28] and its corresponding Bel(-) function is homogeneous to 
a probability measure. The vacuous BBA, or VBBA for short, 
representing a totally ignorant source is defined as m,(Q) = 1. 
The main challenge of the decision-maker consists to combine 
efficiently the possible multiple BBAs ms(-) given by s > 1 
distinct sources of evidence to obtain a global (combined) 
BBA, and to make a final decision from it. Historically the 
combination of BBAs is accomplished by Dempster’s rule 
proposed by Shafer in DST. Because Dempster’s rule presents 
several serious problems (insensitivity to the level of conflict 
between sources in some cases, inconsistency with bounds of 
conditional probabilities when used for belief conditioning, 
dictatorial behavior, counter-intuitive results), many fusion 
rules have been proposed in the literature as alternative to 
Dempster’s rule, see [30], Vol. 2 for a detailed list of fusion 
rules. We will not detail here all the possible combination rules 
but just mention that the Proportional Conflict Redistribution 
rule no. 6 (PCR6) proposed by Martin and Osswald in [30] 
(Vol. 3) is one of the most serious alternative rule for BBA 
combination available so far. 


Bel(A) 4 m(B), and PI(A)=1—Bel(A). (1) 


III. ATANASSOV’S INTER-CRITERIA ANALYSIS (ICRA) 


Atanassov’s Inter-Criteria Analysis (ICrA) approach is 
based on a M x N score matrix® S = [S;; = Cj(Ai),i = 
1,...,M,j = 1,...,N], and intuitionistic fuzzy pairs [12] 
including two membership functions j(-) and v(-). Mathe- 
matically, an intuitionistic fuzzy set (IFS) A is denoted by 
A = {(a,pa(x),va(x))|e@ € E}, where E is the set of 
possible values of x, u(x) € [0, 1] defines the membership of 


5where the symbol = means equal by definition. 
®called index matrix by Atanassov in [31]. 


x to the set A, and v4(x) € [0,1] defines the non-membership 
of x to the set A, with the restriction 0 < wa(x)+v4(x) < 1. 
The ICrA method consists to build an N x N Inter-Criteria (IC) 
matrix from the score matrix S. The elements of the IC matrix 
consist of all intuitionistic fuzzy pairs (j1;;/,V;;") whose 
components express respectively the degree of agreement and 
the degree of disagreement between criteria C; and Cj, for 
j.J' € {1,2,...,N}. For a given column j (i.e. criterion C;), 
it is always possible to compare with >, < and = operators 
all the scores S;; for i = 1,2,...,M because the scores of 
each column are expressed in same unit. The construction of 
IC matrix can be used to search relations between the criteria 
because the method compares homogeneous data relatively to 
a same column. In [32] Atanassov prescribes to normalize the 
score matrix before applying ICrA as follows 


Se = (Si; = Sea = oe), (2) 


if one wants to apply it in the dual manner for the search of 
InterObjects analysis (IObA). 

Because we focus on ICrA only, we don’t need to apply a 
score matrix normalization because each column of the score 
matrix represents the values of a same criterion for different 
alternatives, and the criterion values are expressed with the 
same unit (e.g. m, m?, sec, Kg, or €, etc). 


A. Construction of Inter-Criteria matrix 


The construction of the N x N IC matrix, denoted’ K, is 
based on the pairwise comparisons between every two criteria 
along all evaluated alternatives. Let vat be the number of 
cases in which the inequalities S;; > Sj; and Sj; > Sy, 
hold simultaneously, and let A¥;, be the number of cases 
in which the inequalities S;; > Sy; and Sjj < Sjj- hold 
simultaneously. Because the total number of comparisons 
between the alternatives is M/ (JM — 1)/2 then one always has 
necessarily 


M(M — 1) 
bh rv 
Sy hg Se (3) 
or equivalently after the division by Me >0 
2K", 2K*., 
0<—_ + _4__ ee, (4) 
M(M-1) M(M-1) 


This inequality permits to define the elements of N x N IC 
matrix K = [K,;/] as intuitionistic fuzzy (IF) pairs Kj; = 
(H4j9”,¥jj") where 


2K", 2K”, 

A JI JI 
=> —_ d {>_> 5 
Hi Tat) 4 a ap 


/4;;" Measures the degree of agreement between criteria C; 
and C;,, and v;;- measures their degree of disagreement. By 
construction the IC matrix K is always a symmetric matrix. 


7We use K because it corresponds to the first letter of word Kriterium, 
meaning criteria in German. The letter C’ is being already in use. 
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The computation of A’ aan and Ky, can be done explicitly 
thanks to Atanassov’s formulas [32] 


M-1 M 
Ky a > > [sgn(Si; — Sir )sgn(Sij- — Sirjr) 
i=l #/=i41 
+ sgn(Sirj — Siz )sgn(Sirjr — Sizr)], (6) 
and 
M-1 M 
Kj; => y S- [sgn(Si; — Sir) sgn( Sir = Si;") 
i=l #/=i41 


+ sgn(Sirj — Si; )sgn(Si5" = Siv5)]; (7) 
where the signum function sgn(.) used by Atanassov is 
defined as follows 

1, ifa>0, 


aoe i. if «<0. (8) 

Actually the values of Kr, and [’¥;, depend on the choice 

of sgn(x) function’. That is why in [21], [33], the authors 

propose different algorithms implemented under Java in an 

ICrA software yielding different K’ ve and Ay, values for 

making the analysis and to reduce the dimension (complexity) 
of the original MCDM problem. 


B. Inter-criteria analysis 

Once the Inter-Criteria matrix K = [K,,/] of intuitionistic 
fuzzy pairs is calculated one needs to analyze it to decide 
which criteria C; and Cj, are in strong agreement (or positive 
consonance) reflecting the correlation between C; and Cj, in 
strong disagreement (or negative consonance) reflecting non 
correlation between C; and C;, or in dissonance reflecting the 
uncertainty situation where nothing can be said about the non 
correlation or the correlation between C; and C;,. If one wants 
to identify the set of criteria Cj for j’ # j that are strongly 
correlated with C; then we can sort j1;; values is descending 
order to identify those in strong positive consonance with 
C;. In [25], [26], the authors propose a qualitative scale 
to refine the levels of consonance and dissonance and for 
helping the decision making procedure. A dual approach based 
on vj; values can be made to determine the set of criteria 
that are not correlated with C;. An other approach [10], 
[27] proposes to define two thresholds a, 3 € [0;1] for the 
positive and negative consonance respectively against which 
the components Ly5' and V5! of Ky) = (1055/5 V59") will be 
compared. The correlations between the criteria C; and Cj 
are called “positive consonance”, “negative consonance” or 
“dissonance” depending on their jj and vj; values with 
respect to chosen thresholds a and £3, see [22] for details. 
More precisely, C; and Cj are in 

e (a, 3) positive consonance (i.e. correlated): 

If pyj > a and vj5" < £. 
e (a, ) negative consonance (i.e. no correlated): 


8for instance if we use sgn(x) = 1 if « > 0 and sgn(x) = 0 if x < 0, 
we will obtain, in general, other Kye and Kyi values. 


If [yy < B and Vij >. 
e (a, ) dissonance (i.e. full uncertainty): Otherwise. 


At the beginning of ICrA development it was not very clear 
how these intuitionistic fuzzy (IF) pairs (j0;;-,v;;") had to 
be used and that is why Atanassova [34], [35] proposed to 
handle both components of the IF pair. For this, she interpreted 
pairs ((j;",V;;/) as points located in the elementary TFU 
triangle, where the point T of coordinate (1,0) represents the 
maximal positive consonance (i.e. the true consonance), the 
point F’ with coordinate (0, 1) represents the maximal negative 
consonance (i.e. the falsity), and the point U with coordinates 
(0,0) represents the maximal dissonance (i.e. the uncertainty). 
From this interpretation it becomes easy to identify the top of 
consonant IF pairs (j;;/,1;;’) that fall in bottom right corner 
of (FU) triangle limited by vertical line from x-axis x = a, 
and horizontal line from y-axis y = (3. The set of consonant 
IF pairs are then ranked according to their Euclidean distance 
dé, Gy with respect to T point of coordinate (1,0) defined by 


IEC, = U(1,0), (597 459")) = 1 = Hy)? +37 (9) 


In the MCDM context only the criteria that are negatively 
consonant (or uncorrelated) must be kept for solving MCDM 
and saving computational resources because they have no (or 
only very low) dependency with each other, so that each 
uncorrelated criterion provides useful information. The set 
of criteria that are positively consonant (if any), called the 
consonant set, indicates somehow a redundancy of information 
between the criteria belonging to it in term of decisional 
behavior. Therefore all these positively consonant criteria must 
be represented by only one representative criterion that will 
be kept in the MCDM analysis to simplify MCDM problem. 
Also all the criteria that are deemed strongly dissonant (if any) 
could be taken out of the original MCDM problem because 
they only introduce uncertainty in the decision-making. 


C. General comments on ICrA 


Although appealing at the first glance, the classical ICrA 
approach induces the following comments: 


1) The IF values pj; and vj; can be easily interpreted 
in the belief function framework. Indeed, the belief and 
plausibility of (positive) consonance between criteria C;; 
and Cj, can be directly linked to the values jj; and v5" 
by taking Bel,,/ (8) = L545" and P55 (8) =1- V5 5!. 
Moreover U;,;"(8) = PAjy) (0) ms Bel,,/ (8) =1- V551 — 
/t;;’ Yepresents the dissonance (the uncertainty about 
the correlation) of the criteria C; and Cj. Here the 
proposition 6 means: the criteria C; and C;, are totally 
positively consonant (i.e. totally correlated) and the 
frame of discernment is defined as © = {0,0}, where 
@ means: the criteria C; and C;, are totally negatively 
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consonant (uncorrelated). From this, one can define any 


BBA m54" (8), M44" (0) and M547 (0 U 0) of 2° by 


jy (8) = H3g", (10) 
m3; (8) = Vig (11) 
m33(8U8) = 1 = piggy — 159. (12) 


2 


wer 


The construction of j1;;, and vj; proposed in the 
classical ICrA is disputable because it is only based on 
counting the valid “>” or “<” inequalities but it does 
not exploit how bigger and how smaller the scores values 
are in each comparison done in the construction of the 
Inter-Criteria Matrix K. Therefore the construction of 
[jj and v;;* is actually only a very crude method to 
estimate IF pairs. 

The construction of the Inter-Criteria Matrix K is in fact 
not unique as reported in [33]. This will yield different 
results in general. 

The exploitation of the ICrA method depends on the 
choice of a and { thresholds that will impact the final 
result. 

The classical ICrA method cannot deal directly with 
imprecise or missing score values. 


3 


er 


4 


a 


5 


wm 


IV. A NEWICRA METHOD BASED ON BELIEF FUNCTIONS 


We present in this section a new ICrA method, called BF- 
ICrA for short, based on belief functions that circumvents most 
of the aforementioned drawbacks of classical ICrA. Here we 
show how to get more precisely the Inter-Criteria Belief Matrix 
and how to exploit it for MCDM simplification. 


A. Construction of BBA matrix from the score matrix 


From any non-zero score matrix S = [S;,;], we can construct 
the M x N BBA matrix M = [m,;(-)] as follows 


mij( Az) = Bels(A;), (13) 
mij(Ai) = Belij(Ai) =1—Plij(Ai), (14) 
Mig (A; U Aj) = Plj; (A;) = Bel;;(Ai). (15) 
Assuming AJ,,, #0 and A/., #0, we take? 
Bel,j(Ai) = Sup;(Ai)/ Arex: (16) 
Belj;(A;) & Inf;(Ai)/Aains (17) 
where AJ,,, & max; Sup;(A;) and A? & min; Inf;(Ai) 
and with 
Sup;(Ai) = » [Siz — Sigil; (18) 
ke {1,...M}|Skpj<Si; 
Inf, (Ai) = — > [Siz — Srg]- 19) 


ke{l,...M}|S,j>Si; 


The entire justification of these formulas can be found in our 
previous works [3]. For another criterion Cj and the j’-th 
column of the score matrix we will obtain another set of 
BBA values m;,;/(-). Applying this method for each column 


°Tf Ajuax = 0 then Bel;;(A;) = 0, and if A’, = 0 then Pl;j(Ai) = 1. 


of the score matrix we are able to compute the BBA matrix 
M = [m,;(-)] whose each component is in fact a triplet 
(mij (Ai), Miz (Ai), mis (A; U Aj)) of BBA values in (0, 1] 
such that mi; (A;) + mi; (Ai) + Miz (Aj U Aj)) = 1 for all 
i=1,...,M andj =1,...,N. 


B. Construction of Inter-Criteria Matrix from BBA matrix 


The next step of BF-ICrA approach is the construction of 
the N x N Inter-Criteria Matrix K = [K,,] from M x N 
BBA matrix M = [m,;(-)] where elements K,, corresponds 
to the BBA (m4 (0), my4r (8), m5" (0 U 0)) about positive 
consonance 6, negative consonance @ and uncertainty between 
criteria C; and Cj respectively. The principle of construction 
of the triplet Kj => (m5 (9), M454! (0), M454" (0 U 0)) is based 
on two steps that will be detailed in the sequel: 

e Step 1: For each alternative A;, we first compute the BBA 

(mi, (8), m5;(8),m5;,(8U)) for any two criteria j, j’ € 

{ee ee 
e Step 2: The BBA (m5 (0), M547 (0), TM 5 4! (0 U 0)) is then 

obtained by the combinations of the M BBA mi,,(.). 


We present the details of each step of BF-ICrA method. 
Step 1: Construction of BBA mi,,(.) 


The mass of belief m‘,,(@) represents the degree of agree- 


ment between the BBA m,;(-) and m,,;/(-) for the alternative 
Aj, and mi) (0) represents the degree of disagreement be- 
tween mj;(-) and mj;/(-). The mass mj ;,(9U@) is the degree 
of uncertainty about the agreement (or disagreement) between 
my;(-) and mj,,/(-) for the alternative A;. The calculation of 
m',,,(@) could be envisaged in several manners. 

The first manner would consist to consider the degree of 
conflict [28] Ky 4 Dix.ycexny=0 Mig (X)miyr (Y) and 
consider the Bayesian BBA mj,,(0) = 1 — kj, mjj(8) = 
ki and mj;;(0U@) = 0. Instead of using Shafer’s conflict, 
the second manner would consist to use a normalized distance 
d',, = d(mi;, mij") to measure the closeness between mi; (-) 
and m,,/(-), and then consider the Bayesian BBA modeling de- 
fined by mi, (0) = 1—di.,, mi, (8) = di, and mjj(0U8) = 
0. These two manners however are not very satisfying because 
they always set to zero the degree of uncertainty between the 
agreement and disagreement of the BBA, and the second man- 
ner depends also on the choice of the distance metric. So, we 
propose a more appealing third manner of the BBA modeling 
of mi,,(8), m’,,(8), and m‘,,,(9 U6). For this, we consider 
two sources of evidences (SoE) indexed by 7 and j’ providing 
the BBA mij and mj; defined on the simple FoD {A;, Ai} 
and denoted Mig = [mij (Ai), mij (Ai), mij (A; U A;)| and 
Mi = [Mmi;" (Aj), mi; (Ai), Mj! (A; U A;)). We also denote 
© = {6,6} the FoD about the relative state of the two SoE, 
where @ means that the two SoE agree, § means that they 
disagree and U6 means that we don’t know. Then the BBA 
modeling is based on the important remarks 


e Two SoE are in total agreement if both commit their 
maximum belief mass to the element A; or to element 
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Aj. So they perfectly agree if mi;(Ai) = mij(Ai) = L 
or if m;;(Ai) = mji;(Ai) = 1. Therefore the pure degree 
of agreement!” between two sources is modeled by 


mj (8) = mg (Ai)migr (Aa) + mig (Amy (A). 


I (20) 


e Two SoE are in total disagreement if each one commits 
its maximum mass of belief to one element and the other 
to its opposite, that is if one has mj;;(A;) = 1 and 
Mij! (A;) = 1s, or if mi; (Ai) = 1 and Mig! (A;) Sel; 
Hence the pure degree of disagreement’! between two 
sources is modeled by 


mi, «r (O) = mij (Aa)mig: (Aa) + mij (Aa)may (Ai). (21) 


e All possible remaining products between components of 
mj and mj; reflect the part of uncertainty we have about 
the SoE (i.e. we don’t know if they agree or disagree). 
Hence the degree of uncertainty between the two sources 
is modeled by 


m5 51(0 U8) = mij(Ai)mi;/(Ai U Ai) 
+ miz(Ai)miz: (Ai U As) 
+ miz(Ai U Ai) rizr (Ai) 
+ mij (Ai U Ai)mi3: (Ai) 
+ mij(Ai U Ai) (Ai U Ai). 


(22) 


By construction m‘,,,(-) = m',,(-), hence this BBA modeling 
permits to build a set of MM symmetrical Inter-Criteria Belief 
Matrices (ICBM) K’ = [K%.,] of dimension N x N relative 
to each alternative A; whose components /’;,, correspond to 
the triplet of BBA values m;, = (mj, (0), mj,i(8), mj0(0U 
0)) modeling the belief of agreement and of disagree- 
ment between Cj and Cj based on A;. One has also’? 
M55 (8), 17} 47(8), M5; (AUB) € [0, 1] and m5, (A) +m} 5, (8)+ 
m/,;(9U@) = 1. This BBA construction can be easily extended 
for modeling the agreement, disagreement and uncertainty of 


n > 2 criteria Cj,,...,C;, altogether if needed by taking 


mi, 5, (0) = [J mig, (As) + [| maz, (Ai), 
k=1 k=1 


> ]] vs. (%,); 
Xj, sects Xj, €{Ai, Ai} k=1 
X51 9...0X5, =0 
m —. (OU é)=1- Mn (0) — len (0). 
Step 2: Construction of BBA m,;/(.) (fusion step) 


Once all the BBAs mi,,(.) (i = 1,...,M) are calcu- 
lated one combines them to get the component Kj; = 
(mj; (8), m3” (8), mj; (8U8)) of the Inter-Criteria Belief ma- 
trix (ICBM) K = [K,;]. This fusion step can be done in many 
ways depending on the combination rule chosen by the user. If 
the number of alternatives / is not too large we recommend 


10 or positive consonance according Atanassov’s terminology. 

‘lor negative consonance according Atanassov’s terminology. 

l2because (mij (Aj) +mij (Aj)+mi; (AiUAi))(mi57 (Ai)+mj;/ (Ay)+ 
mj (Ai UAi)) el ea 


411 


to combine the BBAs m‘,,(.) with PCR6 fusion rule [30] 
(Vol. 3) because of known deficiencies of Dempster’s rule. If 
M is too large to prevent PCR6 working on computer, we 
can just use the simple averaging rule of combination in these 
high dimensional MCDM problems.Simple Matlab™ code for 
PCR6 rule can be found in [42] for convenience. 

The computational complexity of BF-ICrA is of course 
higher than the complexity of ICrA because it makes a more 
precise evaluation of local and global inter-criteria belief ma- 
trices with respect to Intuitionistic Fuzzy inter-criteria matrices 
of ICrA. The overall reduction of the computational burden 
of the original MCDM problem thanks to BF-ICrA depends 
highly on the problem under concern, the complexity and cost 
to evaluate each criteria involved in it, as well as the number 
of redundant criteria identified by BF-ICrA method. 


V. SIMPLIFICATION OF ORIGINAL MCDM THANKS TO 
BF-ICRA 


Once the global Inter-Criteria Belief Matrix K = [Kj,; = 
(m5 (0), 54 (0), m5" (0 U 6))] is calculated, we need to 
identify and cluster the criteria that are in strong agreement, in 
strong disagreement, and those on which we are uncertain. For 
identifying the criteria that are in very strong agreement, we 
evaluate the distance of each component of Kj; with the BBA 
representing the best agreement state and characterized by the 
specific BBA!’ m7(@) = 1. From a similar approach we can 
also identify, if we want, the criteria that are in very strong 
disagreement using the distance of m,,/(-) with respect to the 
BBA representing the best disagreement state characterized by 
the specific BBA mpr(@) = 1. As alternative of Jousselme’s 
distance [37], we use the dgr(.,.) distance based on belief 
interval [36] because it is a good method for measuring the 
distance d(m1, m2) between the two BBAs' m (-) and m4(-) 
over the same FoD. It is defined by 


dgr(m,m2) = [Ne- S> diy(BL(X), Bla(X)), (23) 
XE2° 


where the Belief-Intervals are defined by Bl(X) + 
[Bely(X), Ply(X)] and Blg(X) = [Bela(X), Plo(X)] and 
computed from mj,(.) and mo(.) thanks to formula (1). 
dw(BI,(X), BIz(X)) is Wassertein’s distance between in- 
tervals calculated by 


dw ([a1, bi], [a2, b2]) = 
ath  az+be re bi—a — by—a2]” 
2 2 3 2 2 , 


and N. = i722) is a factor to get dgr(mi, m2) € [0, 1]. 


(24) 


Because all criteria that are in strong agreement somehow 
contain redundant (correlated) information and behave simi- 
larly from decision-making standpoint, we propose to simplify 


13We use the index T' in the notation m7(-) to refer that the agreement is 
true, and F in mp(-) to specify that the agreement is false. 
“Here mi (-) = mjj/(.), and ma(-) = my(-) or ma(-) = mp(.). 
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the original MCDM problem by keeping in the MCDM only 
criteria that are non redundant The remaining criteria can be 
eventually weighted by their degree of importance reflecting 
the number of different criteria that are in agreement through 
this BF-ICrA approach. 

For instance, if one has a seven criteria MCDM problem 
and if criteria Ci, Co and C3 are in strong agreement we 
will only select one remaining criterion among {C),C2,C3} 
and we give it a weight of w; + w2 + wz. Moreover if Cy 
and Cs are in strong agreement also we will only select one 
remaining criterion among {C, C5} and we give it a weight 
of w4 + ws, and we will use the weight we for Ce, and w7 
for C7. Hence the original MCDM problem will reduce to 
a four simplified MCDM problem that can be solved using 
BF-TOPSIS method already presented in details in [3] and in 
[8], or with AHP [4] if one prefers, or with any other chosen 
method that the system-designer may prefer. 

The strategy for selecting the most representative criterion 
among a set of redundant criteria is not unique and depends 
mainly on the cost necessary (i.e. human efforts, data mining, 
computational resources, etc) for getting the values of the score 
matrix of the problem under concern. The least costly criteria 
may be a good option of selection. 

In [38], we provided simple detailed examples for BF-ICrA 
where we selected the representative criterion as being the one 
with smallest index. So in the aforementioned example the 
simplified MCDM problem will reduce to a M x 4 MCDM 
problem involving only four criteria Cy, C4, Cg and C7. 

The BF-ICrA method proposed in this work allows also, in 
principle, to make a refined analysis (if necessary) based on 
IC matrices Ki, about the origin of disagreement between 
criteria with respect to each alternative A; in order to identify 
the potential inconsistencies in original MCDM problem. This 
aspect is not developed in this paper and has been left for 
future investigations. It is worth mentioning that the analysis 
of the number of redundant criteria versus time improvements 
that could be proposed as an effective measure of performance 
of this approach depends highly of the application under 
consideration and the difficulty (and cost) to get the value 
of each criteria. 


VI. Two APPLICATIONS OF BF-ICRA 


In this section we present two applications of the BF-ICrA 
approach. The first one is for Global Positioning System (GPS) 
Surveying Problems (GSP) presented in [39], and the second 
one is for the car selection problem. 


A. Application of BF-ICrA for the GPS surveying problem 


GPS surveying is an NP-hard problem. For designing Global 
Positioning System surveying network, a given set of earth 
points must be observed consecutively. The survey cost is the 
sum of the distances to go from one point to another one. 
This kind of problems is hard to be solved with traditional 
numerical methods. Here we use BF- ICrA to analyze an Ant 
Colony Optimization (ACO) algorithm developed to provide 


near-optimal solutions for Global Positioning System survey- 
ing problem. 

GPS satellites continuously transmit radio signals to the 
Earth while orbiting it. A receiver, with unknown position on 
Earth, has to detect and convert the signals received from all of 
the satellites into useful measurements. These measurements 
would allow a user to compute a three-dimensional coordinate 
position: location of the receiver. Any GPS observation is 
proven to have biases, hence, in order to survey an appro- 
priate combination of measurement processing strategies must 
be used to minimize their effect on the positioning results. 
Differencing data collected simultaneously from two or more 
GPS receivers to several GPS satellites allows to eliminate 
or significantly reduce most of the biases. The GPS network 
can be defined as set of stations (a1,a2,...@,), which are 
co-ordinated by placing receivers (X1, X2,...) on them to 
determine sessions (a)d2, a143,42d3,...) among them. The 
problem is to search for the best order in which these sessions 
can be organized to give the best schedule. Thus, the schedule 
can be defined as a sequence of sessions to be observed 
consecutively. The solution is represented by linear graph 
with weighted edges. The nodes represent the stations and 
the edges represent the moving cost. The objective function 
of the problem is the cost of the solution which is the sum 
of the costs (time) to move from one point to another one, 
C(V) = >oC(ai, aj), where a;a; is a session in solution V. 
For example if the number of points (stations) is 4, a possible 
solution is V = (a1, a@3,d@2,a4) and it can be represented by 
linear graph a; — a3 — ag — a4. The moving costs are 
as follows: C'(a1, a3), C(a3, a2), C(a2,a4). Thus the cost of 
the solution is C(V) = C(a1, a3) + C(a3, a2) + C(a2, a4). In 
practice, determining how each GPS receiver should be moved 
between stations to be surveyed in an efficient manner taking 
into account some important factors such as time, cost etc. The 
problem is to search for the best order, with respect to the time, 
in which these sessions can be observed to give the cheapest 
schedule or to minimize CV). The initial data is a cost matrix, 
which represents the cost (time, or distance) of moving a 
receiver from one point to another. Solving such problems - 
GSPs - to optimality requires a very high computational time. 
Therefore, meta-heuristic methods are used to provide near- 
optimal solutions for large networks within acceptable amount 
of computational effort. In this paper, we consider the Max- 
Min Ant System (MMAS) meta-heuristic [40] and we present 
it briefly in the next subsection. 

Real ants foraging for food lay down quantities of 
pheromone (chemical cues) marking the path that they follow. 
An isolated ant moves essentially at random but an ant 
encountering a previously laid pheromone will detect it and 
decide to follow it with high probability and thereby reinforce 
it with a further quantity of pheromone. The repetition of the 
above mechanism represents the auto-catalytic behavior of real 
ant colony where the more the ants follow a trail, the more 
attractive that trail becomes. 

The ACO algorithm uses a colony of artificial ants that 
behave as cooperative agents in a mathematics space were 
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they are allowed to search and reinforce pathways (solutions) 
in order to find the optimal ones. The problem is represented 
by graph and the ants walk on the graph to construct solu- 
tions. The solution is represented by path in the graph. After 
initialization of the pheromone trails, ants construct feasible 
solutions, starting from random nodes, then the pheromone 
trails are updated. At each step ants compute a set of feasible 
moves and select the best one (according to some probabilistic 
rules) to carry out the rest of the tour. The transition probability 
pij, to chose the node j when the current node is 2, is based 
on the heuristic information 7;; and pheromone trail level 7;; 
of the move, where 7,7 = 1,....,7. 

rims 
D etiieeed ue 

The higher value of the pheromone and the heuristic infor- 
mation, the more profitable is to select this move and resume 
the search. In the beginning, the initial pheromone level is set 
to a small positive constant value 7) and then ants update this 
value after completing the construction stage. ACO algorithms 
adopt different criteria to update the pheromone level. 

In our implementation we use MAX-MIN Ant System 
(MMAS) [40], [41], which is ones of the best ant approaches. 
In MMAS the main is using fixed upper bound 7,42 and lower 
bound 7,,,in of the pheromone trails. Thus accumulation of big 
amount of pheromone by part of the possible movements and 
repetition of same solutions is partially prevented. The main 
features of MMAS are: 

The aim of using only one solution is to make solution 
elements, which frequently occur in the best found solutions, 
get large reinforcement. Pheromone trail update is given by: 


Pi = (25) 


Tig — pTig + ATiy, (26) 
where 
1/C(Voest) if (4,7) © best solution, 
Atij = : 
0 otherwise, 
and Vpes is the iteration best solution and 7,7 = 1,...,n. 


To avoid stagnation of the search, the range of possible 
pheromone value on each movement is limited to an interval 
[Tmin; Tmax]: Tmax iS an asymptotic maximum of 7;; and 
Tmax = 1/(1 — p)C(V*), while Tin = 0.087Tmax. Where 
V* is the optimal solution, but it is unknown, therefore we 
use Vpest instead of V*. 

When all ants have completed their solutions, the 
pheromone level is updated by applying the global update 
rule. Only the pheromone corresponding to the best found 
solution is increased by the similar to the MMAS way. The 
global update rule is intended to provide a greater amount of 
pheromone on the paths of the best solution. It is a kind of 
intensification of the search around the best found solution. 
We use heuristic information equals to one over the cost of 
the session. 

Here, we analyze the experimental results obtained using 
MMAS algorithm. For this, we use real data from Malta and 


Seychelles GPS networks composed of 38 sessions and 71 
sessions respectively denoted GSP1 and GSP2. We use also 6 
larger test problems range from 100 to 443 sessions denoted 
GSP3,..., GSP8. The results are obtained by performing 30 
independent runs, for every experiment. The details of our 
MMAS implementation are given in [43]. So in our GSP 
example we consider 8 GSP criteria C; = GS Pi, i =1,...,8 
and six average costs as results Aj, ..., Ag, where A is the 
cost average for the first 5 runs, Ag the cost average for the 
first 10 runs, Ag for the first 15 runs), ...and finally C6 for 
all the 30 runs. Table I shows the values of averaged costs 
obtained for this problem. It corresponds to the transpose of 
the score matrix S. 

Hence in this problem M = 6 and N = 8, and S = [S;,] is 
a 6 x 8 score matrix. Based on classical ICrA approach, one 
gets the following IC matrices!" K“ and K” matrices given 
in (27) and (28). 

The element a of matrix K” expresses the degree of 
agreement between criteria C; = GSP; and Cj, = GSP;, 
whereas the element K’¥;, of matrix K” expresses the degree 
of disagreement between C; = GSP; and Cy = GSPj:. 

Based on these results, one sees that ACO algorithm per- 
forms similarly for GS'P2, GSP, GSP; and G'S Px because 
they are all in high agreement. Indeed yj,” values for j, 7’ € 
{2,4,5,8} are quite high (greater than 70%). They are GPS 
networks with different numbers of sessions, but may have a 
similar structure, therefore, the value of agreement is high. For 
other networks, we can conclude that they have very different 
structure. 

What is worth noting is that there appears also a strong 
agreement of GSP1 with GSP8 because fig = 0.87. But 
because GSP8 is also in strong agreement with GSP2, GSP4, 
GSP5 and with GSP1 it is logically expected that GSP1 
should be also in agreement with GSP2, GSP4, GSP5, which 
is unfortunately not the case based on this classical ICrA. 
This example points out some inconsistency of ICrA result 
because of the too crude method of estimation of the degree 
of agreement and disagreement between criteria based on IFS. 

Now if we consider the same example with the same score 
matrix S (built from Table I), we obtain the following IC 


Belief matrices'® K(#) and K(6) given by (29) and (30). 
From ICBM K(é) and K(@) we compute the matrix D(0) 
of distance of m,,/(.) to the full agreement state with BBA 
mp(@) = 1 based on dgr(.) distance. We get the following 
distances to full agreement D(@) given in (31). 

The element Dj; represents the agreement distance be- 
tween C; and Cj, the lower the better. From the values of 
elements of D(@) matrix one sees clearly that ACO performs 
similarly for GSP2, GSP4 and GSP5 because distances Do,, 


15For presentation convenience and due to typesetting column width, we 
decompose et present the IC matrix K = [Kj = (Kyi KF] into two 
distinct matrices K¥ = [Ky] and KY = [K*.,].- 


[K®., = m,,/(0)], K(8) = [KE = yg (O)]) and KU 0) = (KEY? = 


jj! JI v Id 
1—- TG 5! (@) == Tsar (8). 
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Aj Ag A3 Ag As 
Cy, =GSPI1 899.00 898.00 898.33 898.50 899.40 
Cz = GSP2 916.40 915.60 922.47 924.80 924.72 
C3 =GSP3  41336.40 41052.40 40991.93 40935.90 40832.20 
C4 = GSP4 3244.80 3303.30 3327.00 3344.55 3345.60 
C5 = GSP5 1656.20 1660.80 1663.93 1664.95 1666.96 
Ce = GSP6 1673.60 1683.50 1690.73 1688.75 1690.24 
C7 =GSP7 3420.00 3430.70 3433.13 3426.85 3429.44 
Cg = GSP8 3758.20 3755.70 3758.73 3760.50 3760.80 
Table I 
TRANSPOSE OF THE SCORE MATRIX S = [Sj;] OF GSP PROBLEM. 
Cy Co Cs C4 Cy Ce Cy Cx 
C1 1 0.60 0.27 0.67 0.73 0.67 0.33 0.87 
C2 re i 0.27 0.80 0.73 0.53 0.47 0.73 | 
C3 | 0.27 0.27 i 0.07 0.20 0.40 0.13 
Ki = C4 | 0.67 0.80 0.07 1 0.93 0.73 0.53 0.80 
~ Cs | 0.73 0.73 0) 0.93 0.80 0.60 0.87 
Ce | 0.67 0.53 0.20 0.73 0.80 fi 0.67 0.80 
C7 | 0.33 0.47 0.40 0.53 0.60 0.67 I 0.47 
Cs L0.87 0.73 0.13 0.80 0.87 0.80 0.47 1 
Cy Co C3 Cy Cy Ce Cr Cx 
Cy 0 0.40 0.73 0.33 0.27 0.33 0.67 0.13 
C2 lee 0 0.73 0.20 0.27 OAT 0.53 iar | 
C3 |} 0.73 0.73 0 0.93 0.80 0.60 0.87 
K’= C4 | 0.33 0.20 0.93 0 0.07 0.27 0.47 0.20 
~ C5 10.27 0.27 1 0.07 0 0:20 O40 0.13 
Ce | 0.33 0.47 0.80 0.27 0.20 0 0.33 0:20 
C7 | 0.67 0.53 0.60 0.47 0.40 0.33 0 0.53 
Cg L0.13 0.27 0.87 0.20 0.13 0.20 0.53 0 
Ci C2 C3 C4 Cs Ce C7 
C, 70.9098 0.6732 0.1791 0.5968 0.6106 0.5620 0.1659 
Cz | 0.6732 0.9546 0.0364 0.8983 0.8783 0.8341 0.5532 
C3 | 0.1791 0.0364 0.8722 0.0172 0.0154 0.0178 0.0366 
K(0) = C4 | 0.5968 0.8983 0.0172 0.9552 0.9146 0.9163 0.7395 
Cs | 0.6106 0.8783 0.0154 0.9146 0.8917 0.8778 0.6922 
Ce | 0.5620 0.8341 0.0178 0.9163 0.8778 0.9060 0.7630 
C7 | 0.1659 0.5532 0.0366 0.7395 0.6922 0.7630 0.8587 
Cg L0.7789 0.7016 0.1137 0.6092 0.6315 0.6441 0.2484 
C71 C2 C3 C4 Cs Ce Cr 
C, 70.0207 0.1941 0.5385 0.2578 0.1757 0.2117 0.5335 
Cz | 0.1941 0.0166 0.8323 0.0486 0.0298 0.0513 0.1808 
C3 | 0.5385 0.8323 0.0117 0.9002 0.8754 0.8548 0.7062 
K(5) = C4 | 0.2578 0.0486 0.9002 0.0187 0.0216 0.0204 0.0606 
Cs | 0.1757 0.0298 0.8754 0.0216 0.0170 0.0201 0.0558 
Ce | 0.2117 0.0513 0.8548 0.0204 0.0201 0.0154 0.0390 
C7 | 0.5335 0.1808 0.7062 0.0606 0.0558 0.0390 0.0110 
Cg L0.0399 0.0682 0.5486 0.1193 0.0832 0.0726 0.3495 
Cy C2 C3 C4 Cs 
C; 70.0590 0.2633 0.6845 0.3331 0.2892 
C2 | 0.2633 0.0321 0.8987 0.0767 0.0803 
C3 | 0.6845 0.8987 0.0774 0.9418 0.9306 
i=ia tae = C4 | 0.3331 0.0767 0.9418 0.0326 0.0566 
GO BING OE Cs | 0.2892 0.0803 0.9306 0.0566 0.0679 
Ce | 0.3314 0.1135 0.9192 0.0552 0.0770 
C7 | 0.6893 0.3230 0.8381 0.1706 0.1958 
Cg L0.1406 0.1950 0.7241 0.2668 0.2404 
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Ag 
899.50 
922.07 


40910.60 


3341.93 
1665.90 
1692.67 
3428.57 
3765.80 


Cs 
0.7789 
0.7016 
0.1137 
0.6092 
0.6315 
0.6441 
0.2484 
0.8508 


Cs 
0.0399 
0.0682 
0.5486 
0.1193 
0.0832 
0.0726 
0.3495 
0.0100 


C6 
0.3314 
0.1135 
0.9192 
0.0552 
0.0770 
0.0592 
0.1494 
0.2293 


Cr 
0.6893 
0.3230 
0.8381 
0.1706 
0.1958 
0.1494 
0.0849 
0.5626 


Cs 
0.1406 
0.1950 
0.7241 
0.2668 
0.2404 
0.2293 
0.5626 
0.0892 


(27) 


(28) 


(29) 


(30) 


(31) 
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Dos, and D4s are very small. Also we see that GSP6 is also 
in good agreement with GSP4 and GSP5 but is relatively less 
in agreement with GSP2 because Dog = 0.1135. 

As we see there is no inconsistency in this new BF-ICrA 
method with respect to what provides classical ICrA because 
with BF-ICrA we have a much better and precise estimation 
of degrees of agreement and disagreement between criteria 
for making the analysis thanks to a proper belief functions 
modeling. 


B. Application of BF-ICrA for the car selection problem 


Let’s consider another concrete problem related to car 
selection. Suppose one has a limited budget of 12000€ and 
one wants to buy a new car based on multiple criteria. A set of 
potential cars under 12K € that present interest with respect 
to some criteria is obtained initially from a search on the web. 
How to apply BF-ICrA to simplify the selection process, and 
how to make the final choice of the car to buy? 

Here we consider a set of ten small urban cars 
{A1, Ao, eck ,Aio} as follows: 

e A, = DACIA SANDERO SCe 75; 

e Ay = RENAULT CLIO TCe 75; 

e A3 =SUZUKI CELERIO 1.0 VVT Avantage; 

e A, = FORD KA+ Ka+ 1.2 70 ch S&S Essential; 

e As = MITSUBISHI SPACE STAR 1.0 MIVEC 71; 

e Ag = KIA PICANTO 1.0 essence MPi 67 ch BVM5; 

e A; = HYUNDAI I10 1.0 66 BVMS Initia; 

e Ag = CITROEN Cl VTi 72 S&S Live; 

e Ag = TOYOTA AYGO 1.0 VVT-i x; 

e Aig = PEUGEOT 108 VTi 72ch S&S BVM5 Like; 

We consider the following seventy criteria related to price, 
dimensions, engine and consumption of the car for making the 
choice of the best car to buy: 

e C; is the price (€); 

e Cy is the length (mm); 

e C3 is the height (mm); 

e C4 is the width without mirror (mm); 

e Cs is the wheelbase (mm); 

e Cé is the max loading volume (L); 

e C7 is the tank capacity (L); 

e Cy is the unloaded weight (Kg); 

e Cy is the cylinder volume(cm?); 

e Co is the acceleration 0-100 Km/h (s); 

e Cy, is the max speed (Km/h); 

e Cp is the power (Kw); 

e C43 is the horse power (hp); 

e C4 is the mixed consumption (L/100Km); 

e Cj is the extra-urban consumption (L/100Km); 

e Cg is the urban consumption (L/100Km); 

e C7 is the CO2 emission level (g/Km) 

The score matrix S = [S;,] is built from information 
extracted from car-makers technical characteristics available 
on the world wide web!’site. For the chosen cars, the corre- 
sponding original score matrix is given by (32). 


'7https://automobile.choisir.com/comparateur/voitures-neuves 


For criteria Ci, C4, Cg, and C4 to C17 we consider that 
smaller is better. For other criteria larger is better. To make 
the preference order homogeneous in the score matrix, we 
multiply values of columns C1, Cy, Cg, and C4 to C7 by -1 
so that our MCDM problem is described by a modified score 
matrix with homogeneous preference order (“larger is better”) 
for each column before applying the BF-ICrA method. 

After applying BF-ICrA method (with PCR6 fusion rule in 
step 2) we obtain the following IC Belief matrices K(0) = 


[m55" (8), K(0) = [mj (8)] and K(0 U g)}8 given by (33) 
and (34). 


From ICBM K(0) and K(@) we compute the matrix 
D(@) = (Dj; = der(m;;,mr)| of distance of the BBA 
m,'(.) with respect to the full agreement state having BBA 
mp(0) = 1 based on dgr(.) distance. We get the distances to 
full agreement given in (35). 

The element Dj; represents the agreement distance be- 
tween C;; and Cj, the lower the better. 

From the analysis of elements of Dj; one sees clearly that 
criteria Ci4, Cis, Cig and Cj7 are in very strong agreement 
and will behave very similarly for the preference ordering 
which is not very surprising because they are all related 
with energy consumption. Hence only one criteria among 
of these four criteria be used to simplify the MCDM car 
selection problem. We decide to keep only criteria Ci¢ (urban 
consumption) in simplified MCD because urban displacements 
will be the main use of the car. One sees clearly that Co, 
Cs and C7 are also in very strong agreement and so they 
will behave very similarly for the preference ordering. One 
decides to keep only the criterion C7 (tank capacity) which we 
consider more important than criteria C2 and Cs because it is 
linked to autonomy of the car. From BF-ICrA, one sees that 
tank capacity is linked with the dimensions of the car (mainly 
its length and wheelbase), which makes perfectly sense. Also 
we can note that criteria Cj2 and C3 are not too far since 
their distance is only 0.1403 and we can simplify a bit more 
the MCDM problem by taking only criterion C2 (the power) 
instead of keeping Cj2 and C43. 

Thanks to BF-ICrA, we can simplify the original MCDM 
car selection problem by removing redundant criteria and 
keeping only those which bring useful information. So our 
simplified MCDM car selection problem is characterized by 
the 10 x 11 score matrix given in (36). 

From this reduced score matrix, we can apply classical 
MCDM techniques to find the final preference order for 
making final decision and selectioning the car to buy. For 
this, one needs to define the importance imp(C;) of each 
criteria C; involved in the score matrix above. For simplicity, 
the importance of each criteria Cj is expressed as a value 
in {1,2,3,4,5}, where 1 means the least important, and 5 
means the most important. In this car selection example we 
take imp(C1) = imp(Cig) = 5, imp(Cg) = imp(C7) = 4, 
imp(Cio) = imp(Cu) = imp(Ciz) = 3, imp(Cs) = 


, '8The ICBM K(@U8) is obtained from K(@) and K(6) by taking K(9U 
0):= [Ke = 1— mj; (0) — m,;;/(8)).- 
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0073 
0559 
0413 
5165 
0452 
0334 
0503 
3839 
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5677 
6001 
5615 
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C2 C3 C4 Cs C6 
4069 1523 1733 2589 1200 
4063 1448 1732 2589 1146 
3600 1530 1600 2425 1053 
3941 1524 1774 2490 1029 
3795 1505 1665 2450 910 
3595 1485 1595 2400 1010 
3665 1500 1660 2385 1046 
3466 1465 1615 2340 780 
3465 1460 1615 2340 812 
3475 1460 1615 2340 780 
C2 C3 C4 C5 or 

0.6456 0.6005 0.0722 0.6689 0.6518 
0.8905 0.4994 0.0281 0.8716 0.7718 
0.4994 0.8352 0.2913 0.4792 0.5196 
0.0281 0.2913 0.8523 0.0403 0.0899 
0.8716 0.4792 0.0403 0.8730 0.7741 
0.7718 0.5196 0.0899 0.7741 0.8063 
0.8635 0.4764 0.0402 0.8602 0.7533 
0.0579 0.4315 0.7553 0.0684 0.0863 
0.1022 0.5934 0.0688 0.0588 0.1126 
0.2123 0.5300 0.1874 0.1916 0.2059 
0.5406 0.1100 0.0941 0.5275 0.3050 
0.6069 0.1676 0.1048 0.6167 0.3777 
0.6062 0.2137 0.1066 0.6093 0.4096 
0.0572 0.2570 0.7690 0.0889 0.0528 
0.0411 0.1837 0.7849 0.0650 0.0493 
0.0746 0.3016 0.7521 0.1098 0.0528 
0.0760 0.2678 0.7520 0.1148 0.0572 
C2 C3 Ca C5 C6 
0.0559 0.0413 0.5165 0.0452 0.0334 
0.0232 0.2781 0.8331 0.0262 0.0469 
0.2781 0.0199 0.4166 0.2692 0.1566 
0.8331 0.4166 0.0164 0.7815 0.6067 
0.0262 0.2692 0.7815 0.0222 0.0407 
0.0469 0.1566 0.6067 0.0407 0.0153 
0.0527 0.3470 0.8499 0.0434 0.0724 
0.7364 0.2421 0.0324 0.6918 0.5777 
0.2368 0.0132 0.3318 0.3119 0.3117 
0.2329 0.0353 0.2199 0.2453 0.2007 
0.1259 0.5231 0.5259 0.1233 0.2177 
0.1136 0.4722 0.5691 0.0963 0.1845 
0.1168 0.4097 0.5472 0.1065 0.1727 
0.8256 0.5242 0.0623 0.7609 0.7491 
0.8285 0.5658 0.0396 0.7695 0.7309 
0.8187 0.5041 0.0880 0.7530 0.7676 
0.8251 0.5527 0.0972 0.7560 0.7671 
C2 C3 C4 C5 Ce 
2225 0.2434 0.7318 0.2054 0.2114 0 
0709 0.3946 0.9034 0.0827 0.1471 0. 
3946 0.1014 0.5689 0.4016 0.3319 0 
9034 0.5689 0.0904 0.8721 0.7634 0 
0827 0.4016 0.8721 0.0805 0.1436 0 
1471 0.3319 0.7634 0.1436 0.1165 0 
0977 0.4383 0.9054 0.0958 0.1673 0 
8414 0.4161 0.1515 0.8146 0.7520 0 
5985 0.2387 0.6548 0.6524 0.6222 0 
5349 0.2821 0.5438 0.5514 0.5261 0 
3081 0.7145 0.7242 0.3145 0.4767 0 
2659 0.6605 0.7382 0.2537 0.4227 0 
2675 0.6078 0.7272 0.2618 0.4001 0 
8848 0.6368 0.1545 0.8372 0.8501 0 
8945 0.6948 0.1370 0.8536 0.8432 0 
8726 0.6039 0.1742 0.8225 0.8589 0 
8750 0.6445 0.1780 0.8214 0.8565 0 
Ci C3 C4 
Ai 7990 =1523 1733 
Az | 10990 1448 1732 
Az | 9790 1530 1600 
Ag | 10350 1524 1774 
a As | 10990 1505 1665 
Ae | 11000 1485 1595 
A; | 11050 1500 1660 
Ag | 11550 1465 1615 
Ag | 11590 1460 1615 
Aio L11950 1460 1615 
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1146 
1053 
1029 
910 
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1046 
780 
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Cs Cy Cho 
969 998 14.2 
1188 898 12.3 
815 998 13.9 
1063 1198 14.6 
865 999 16.7 
860 998 14.3 
1008 998 14.7 
840 998 14 
915 998 13.8 
840 998 12.6 
Cg Co C10 
0.1152 0.3728 0.3624 
0.0579 0.1022 0.2123 
0.4315 0.5934 0.5300 
0.7553 0.0688 0.1874 
0.0684 0.0588 0.1916 
0.0863 0.1126 0.2059 
0.0455 0.1989 0.1959 
0.8060 0.2495 0.3144 
0.2495 0.8901 0.6005 
0.3144 0.6005 0.7484 
0.0877 0.0252 0.1268 
0.1262 0.0187 0.0447 
0.1472 0.0211 0.0500 
0.7398 0.1409 0.1628 
0.7042 0.1061 0.1087 
0.7632 0.1684 0.2057 
0.7424 0.1092 0.1720 
cg Cg C10 
0.3839 0.1473 0.0725 
0.7364 0.2368 0.2329 
0.2421 0.0132 0.0353 
0.0324 0.3318 0.2199 
0.6918 0.3119 0.2453 
0.5777 0.3117 0.2007 
0.8074 0.1993 0.2931 
0.0166 0.0819 0.1206 
0.0819 0.0004 0.0085 
0.1206 0.0085 0.0045 
0.5366 0.4128 0.2490 
0.4927 0.4864 0.4253 
0.4454 0.4643 0.4127 
0.0566 0.2182 0.2898 
0.0559 0.2972 0.3474 
0.0569 0.1824 0.2535 
0.0753 0.2371 0.3006 
Cg Cg C10 
.6506 0.4113 0.3907 
8414 0.5985 0.5349 
.4161 0.2387 0.2821 
.1515 0.6548 0.5438 
.8146 0.6524 0.5514 
-7520 0.6222 0.5261 
.8820 0.5295 0.5681 
.1171 0.4588 0.4349 
.4588 0.0636 0.2331 
.4349 0.2331 0.1466 
7325 0.7125 0.5893 
.6920 0.7476 0.7070 
.6597 0.7367 0.6988 
.1689 0.5695 0.5852 
.1890 0.6200 0.6389 
.1558 0.5405 0.5466 
.1746 0.5947 0.5845 
C7 Cs Co 
45 1138 898 
35 815 998 
42 1063 1198 
40 1008 998 
35 840 998 
35 915 998 
35 840 998 
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-4851 
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-2709 
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.2356 
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-5366 
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.2490 
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-4043 


Ci 

5493 
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3145 
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5748 
5704 


Cio 
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13.9 
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3273 
6069 
1676 
1048 
6167 
3777 
6660 
1262 
0187 
0447 
5442 
7845 
7665 
2940 
2388 
3387 
3693 
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= 
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1333 
1136 
A722 
5691 
0963 
1845 
1074 
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imp(Cy) = 2 and imp(C3) = imp(C4) = 1, which means 
that the price of the car and its urban consumption are the most 
important criteria for us, and its height and its width are the 
least important ones. From these importance values and after 
normalization, we get the following vector of relative weights 
of criteria 
5 1 1 4 4 2 2 3 3 3 ~5 
= 53 33 33) 333) 333) 33) 383) 83) 388 88 33 
When using different BF-TOPSIS methods [3], [8], we will 
obtain the following preference orders 
e with BF-TOPSIS1 method: Az > A 
As > Ag > Aio > Ag > Ag > Az 
e with BF-TOPSIS2 method: Az > A 
As > Ag > A1o > Ag > Ag > AZ 
e with BF-TOPSIS3 method: Az > A 
As > Aip > Ag > Ag > Ag > Az 
e with BF-TOPSIS4 method: Ag > A 
As > Ajo > Ag > Ag > Ag > AZ 
When using classical AHP method [4], we obtain the following 
preference order!?. 


Ag > Ay > Ag > A7 > As > Ag > Ap > Ag > AZ > Alo 


From the results of the BF-TOPSIS methods and AHP 
(with double normalization of score matrix), one sees that 
Ay car (RENAULT CLIO TCe 75) will be the best car to 
buy, and the car A; (DACIA SANDERO SCe 75) will be 
the second best car to buy, whereas A3 (SUZUKI CELERIO 
1.0 VVT Avantage) will be the worst one according to BF- 
TOPSIS or Ajo according to AHP. Because the AHP and BF- 
TOPSIS methods are based on very different principles it is 
not surprising that preference order can change in the results 
of the methods, but what is most important from decision- 
making standpoint is the stability of the order of first best 
solutions. In this example, the car Ag is always the best car 
selection to make with BF-TOPSIS or with AHP method based 
on the chosen criteria involved in this MCDM problem and 
their importance weights. 


> Ay > Az > 
> Ay > Az > 


> Ay > Az > 


> Ay > Az > 


VII. CONCLUSION 


In this paper we have presented a new method called BF- 
ICrA which helps to simplify (when it is possible) Multi- 
Criteria Decision-Making problems based on inter-criteria 
analysis and belief functions. This method is in the spirit 
of Atanassov’s method but proposes a better construction 
of Inter-Criteria Matrix that fully exploits all information of 
the score matrix, and the closeness measure of agreement 
between criteria based on belief interval distance. In fact, 
BF-ICrA proposes a more precise and refined method for 
estimating the degree of agreement and disagreement between 
criteria which use the whole information available in the 


Here we did apply a two steps normalization of the score matrix. At 
first we normalize S according to (2) and in a second step each column 
is renormalized by dividing each element of the column by the sum of its 
elements. If we apply only first normalization step we obtain with AHP the 
preference order Ap > Ag > Ay > A7 > A5 > Ap > Ag > Ag > 
Aio > A3. 


data. This BF-ICrA approach could, in theory, also deal 
with imprecise or missing score values using the technique 
presented in [8]. We have shown two concrete applications 
of BF-ICrA method. The first one related with the GPS 
surveying problem has been addressed in order to overcome 
the potential inconsistencies of the results generated by the 
classical ICrA method. Instances containing from 38 to 443 
sessions have been solved using MMAS algorithm and we did 
compare the performance of ACO algorithms applied to eight 
GPS networks. Our results shows that ACO can provide fast 
near-optimal solution for observing GPS networks, and could 
help to improve the services based on GPS networks. From 
this new Inter-Criteria Analysis we are able to identify some 
relations and dependences between the considered eight GSPs 
and MMAS algorithm performance. In our second application, 
we have shown how a typical (no so simple) multi-criteria car 
selection problem can be addressed and solved by this BF- 
ICrA method coupled with BF-TOPSIS methods. This shows 
the usefulness and potential of this new technique to solve 
MCDM problems. 
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Abstract—Multi-sensor fusion strategies have been widely 
applied in Human Activity Recognition (HAR) in Body Sensor 
Networks (BSNs). However, the sensory data collected by BSNs 
systems are often uncertain or even incomplete. Thus, designing a 
robust and intelligent sensor fusion strategy is necessary for high- 
quality activity recognition. In this paper, Dezert-Smarandache 
Theory (DSmT) is used to develop a novel sensor fusion strategy 
for HAR in BSNs, which can effectively improve the accuracy 
of recognition. Specifically, in the training stage, the Kernel 
Density Estimation (KDE) based models are first built and 
then precisely selected for each specific activity according to 
the proposed discriminative functions. After that, a structure 
of Basic Belief Assignment (BBA) can be constructed, using the 
relationship between the test data of unknown class and the 
selected KDE models of all considered types of activities. In order 
to deal with the conflict between the obtained BBAs, Proportional 
Conflict Redistribution-6 (PCR6) is applied to fuse the acquired 
BBAs. Moreover, the missing data of the involved sensors are 
addressed as ignorance in the framework of the DSmT without 
manual interpolation or intervention. Experimental studies on 
two real-world activity recognition datasets (The OPPORTU- 
NITY dataset; Daily and Sports Activity Dataset (DSAD)) were 
conducted, and the results showed the superiority of our proposed 
method over some state-of-the-art approaches proposed in the 
literature. 


Keywords: HAR, Multi-sensor fusion, Belief function theory, 
KDE, DSmT. 


I. INTRODUCTION 


Human Activity Recognition (HAR) has spawned intense 
researches in the past decades and continues to be an active 
research area [1], [2], [3], [4]. These HAR systems have en- 
abled several practical applications, such as health monitoring 
[5], physical activity [6] and gesture detection. Recently, multi- 
sensor fusion for activity recognition is playing an increasing 
role in HAR field and many strategies have been proposed 
(see [7] for more references). Generally speaking, multi-sensor 
fusion strategies can be mainly categorized into three level 
categories depending on the abstraction level used for data 
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processing: data-fusion level [8], feature-fusion level [9] and 
decision-level fusion [10]. Among all these three fusion levels, 
decision-level fusion output is a unique decision obtained from 
local decision of multiple (homogeneous or heterogeneous) 
sensors. The fusion in this level has many advantages: com- 
munication bandwidth saving, allowing the combination of 
the heterogeneous sensors. In this paper, the main topic thus 
focus on decision-level fusion area. Two most common used 
approaches for this level of fusion are majority voting [11] and 
naive bayes [12]. However, complex sensory data, especially 
when these data are uncertain or even incomplete, make these 
two methods unsuitable for HAR. Two classical scenarios are 
described as follows: 


[EE Missing Data 
[9 Complete Data 
91.21% 


‘Pov-aauy, Wysry 


19% 


(a) Uncertain Data Collected by 
Right Knee Sensor in OPPORTUNITY Dataset. 


(b) Percentage of Missing data Collected 
by Right Knee Sensor in OPPORTUNITY Dataset. 


Figure 1. Uncertain and incomplete sensory data in OPPORTUNITY dataset. 


1) Uncertain sensory data in HAR problem. In order 
to intuitively discuss the uncertainty of sensory data, 
one of the involved sensor in UCI OPPORTUNITY 
dataset [13], [14] was randomly selected and parts of the 
original data of three activities derived from the chosen 
sensor were drawn in Fig.l(a). As we can see from 
Fig.1(a), some objects that are very close can sometimes 
truly originate from different classes. Such objects are 
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really difficult to classify correctly into a particular class 
using the given information. In this case, we call this 
data uncertain when it can belong to different specific 
classes with probability mass assignments to estimate; 

2) Incomplete sensory data in HAR problem. Miss- 
ing data frequently occur during the measurement of 
wearable-based activity recognition. As we can see in 
Fig.1(b), sensory data with incomplete pattern occupy an 
important proportion which cannot be easily neglected 
in OPPORTUNITY dataset. The traditional ways to cope 
with these feature vectors, which include missing data, 
are to interpolate or delete the whole vector. However, 
interpolation or deletion is not the wise choice which 
may bring noise and information loss to the recognition 
system. 


The aforementioned discussions motivate our study, where 
HAR in Body Sensor Networks (BSNs) is implemented based 
on belief function theory [15]. Belief function allows to model 
uncertainty and to fuse Basic Belief Assignments (BBAs) built 
from sensors’ measurements. Within this theory, information 
fusion relies on the use of a combination rule allowing the 
pieces of evidences (drawn from sensor readings) expressed 
in a common frame of discernment to be combined. Among 
all available combination rules, Dempster’s rule proposed by 
Shafer in Dempster-Shafer theory [15] is the most well-known 
rule still used in many applications even if it remains very 
controversial. Recently, Chen et al. [16] proposed a new 
method based on Dempster-Shafer theory to improve human 
action recognition by using the fusion of depth camera and 
inertial sensors. Although the recognition results mentioned 
in [16] is good, two key issues are ignored by authors: 1) 
In Dempster-Shafer theory, there exists an assumption that 
hypotheses considered should be exclusive. However, in HAR, 
activities to be identified often fail to satisfy the characteristics 
of mutual exclusion. For example, the intersection between 
“Walking” and “Running” can be defined as “Standing” or 
intermediate transition state “Walking to Running” [17]; 2) 
Dempster’s rule cannot solve high conflict issues and even 
very low conflict issues in specific cases, which have been 
widely discussed in [18], [19]. 

To solve those mentioned drawbacks in Dempster- 
Shafer theory, Dezert and Smarandache proposed Dezert- 
Smarandache Theory (DSmT) [18] to solve multi-sensor fu- 
sion problems, with more reasonable assumptions and better 
combination rules, which is more appropriate to handle HAR 
problems. In this paper, a new use of DSmT is proposed 
to solve HAR issues thanks to a novel decision-level fusion 
strategy based on DSmT. Such DSmT-based HAR can be used 
for online activity recognition system because of its higher 
recognition accuracy and lower recognition delay, which can 
meet the required response speed in real-time recognition sys- 
tems (less than 200ms) [2]. Specifically, the main contributions 
of this work are summarized as follows: 


e A novel DSmT-based fusion strategy for HAR in BSNs 
is proposed; 


e Kernel Density Estimation (KDE) models are constructed 
based on the sensor readings, and those selected KDE 
models of all considered classes are applied to calculate 
BBAs in DSmT; 

e The missing data in original sensor readings are also 
modeled by vacuous BBA (i.e. the total ignorance source 
of evidence) in DSmT without any manual interpolation; 

e The efficiency of our fusion system with two activities 
recognition open datasets is demonstrated. 

This paper is organized as follows: Section II provides an 
inventory of the basic concepts of DSmT. Section III provides 
a description of the new proposed fusion method. Section IV 
includes the experimental results and discussions. The final 
section V contains a brief conclusion. 


II. BASICS OF DSMT 


In DSmT framework, the BBAs are defined on the so- 
called hyper-power set (or Dedekind’s lattice) denoted D© & 
(8,U,M) whose cardinalities follows Dedekind’s numbers 
sequence, see [18], Vol.1 for details and examples. A (gen- 
eralized) BBA, called a mass function, m/(-) is defined by 
the mapping: D° ++ [0,1], verifying m(@) = 0 and 
Yaepe m(A) = 1. 

To palliate the drawbacks of Demspter’s rule, Martin et.al 
[20] proposed a very interesting combination rule: PCR6. Due 
to its good performance, it is widely applied in recent applica- 
tions. We recall that the PCR6 formula for the combination of 
two BBAs coincides with PCR5 formula originally developed 
by Smarandache and Dezert in [18]. The combination of two 
BBAs mj(.) and me(.) by the PCRS rule is given as follows: 
for mMpcrs(9) =0 and VA € D® 


mpcre(A) = mpors(A) = m42(A)+ 
~ l m4(A)?m2(B) 


BeD®\{A}|ANB=0 m4(A) + mo(B) 


ms A)?m,(B) 
m2(A) +m,(B)” 
(1) 


where my42(A) = DB,CED®|BNC=A My (B)m2(C). 

The combinations of more than two BBAs altogether with 
PCRS5 and with PCR6 fusion rule in general provide different 
results. The choice of PCR6 with respect to PCRS was 
justified at first by Martin and Osswald in [20] from a specific 
application, and then theoretically by Smarandache and Dezert 
in [21]. The general formula of PCR6 for combining more than 
two BBAs was given in details in [20] with examples. 


III. DSMT-BASED FUSION STRATEGY FOR HAR IN BSNS 
A. The Flow Chart of Our Proposed Method 


Before entering in the detailed presentation of our DSmT- 
based fusion strategy, we briefly introduce it through the 
flowchart of Fig.2 for convenience. Specifically, in the training 
stage, multiple KDE models are derived from the raw sensor 
readings so as to build the model pool. Then, the represen- 
tative model is selected for a particular activity based on our 
proposed discriminative functions. After that, when the test 
sample comes, the corresponding BBA is calculated through 
each activity representative model. Finally, these BBAs are 
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Figure 2. DSmT-Based Fusion Strategy for HAR in BSNs. 


combined with PCR6 rule, from which we make the final 
decisions. 


B. Mathematical Definitions of Daily Activities in DSmT 


The goal of our work is to recognize human daily activities 
thanks to DSmT-based framework. Thus, the basic mathemat- 
ical definitions of the interested activities need to be given. 
We assume that the finite frame of discernment considered in 
our activity recognition problem is © = {61, 62,--- ,0,}. The 
corresponding hyper-power set of © is denoted D®. Singletons 
in D® are used to represent the simple daily activity such 
as 0; = Standing, 02 = Sitting, 03; & Lying and so on. 
Disjunctive focal elements in D®© represent the coarse-grained 
activities. For example, 0; U @2 U 63 £ Static Activity. 
Also, if 64 2 Walking,6; 2 Running, then 64 U 65 
is regarded as Dynamic Activity. Following the definition 
line of disjunctive focal elements, 6; U 62---U 6, represents 
the whole unknown activity. Besides, the conjunctive focal 
elements in D®° can be used to stand for the transition 
activity like 6,62 & Standing to Sitting or 0,62 & 
Sitting to Standing because 0, M 62 = 62 0, and 
6263 & Sitting to Lying or Lying to Sitting. In this 
paper, we only consider a restricted hyper-power set, which is 
denoted as D® .icted = (01; 92,°°* On, 01 U0a++*UOn}. In 
1D deen q, only two types of focal elements exist: one is the 
singleton, which represents the simple activity and another is 
0, U@2-+-U8,, which represents the unknown activity. More 
complicated situations involving less restricted hyper-power 
sets will be discussed in our future work. 


C. Training Model Stage 


In the training stage, the KDE model is employed to fit the 
sensor readings. The most suitable KDE model to distinguish 
a certain activity is then selected to be regarded as the 
specific activity representative model. Among the process of 
this training stage, two main steps are involved: 
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1) Construction of KDE Models: We assume that there are 
M kinds of activities that need to be classified and the original 
dataset collected from the wearable sensors are denoted as 
xij, 7 = 1,---,M and 7 = 1,---,N. Here, M represents 
the types of activities to be classified and N is the number 
of sensors. Thus, based on the Eq.(2), the KDE model of the 
specific activity is derived from the sensor readings by 


Q L— x7 
i 


where f(x,;) is the KDE model of x;; which represents the 
model of the 7 sensor for the i activity; K(-) is the kernel 
function which can be ’normal’, ’epanechnikov’, *box’ and 
triangle’; h is the smoothing parameter (the bandwidth) of 
the KDE model. In this paper, the value of h is the adaptative 
bandwidth selected by the method presented in [22]; The 
parameter @ is the dimension of x;;. 


2) Selection of the Best Discriminative KDE Model for 
the Specific Activity: As we can see from Eq.(2), each 
activity can have N KDE models and we need to select 
the most discriminative KDE model in order to reduce the 
computational complexity and the interference model. Once 
the unique KDE model for each activity is selected, one can 
easily determine a specific sensor to identify activity because 
there is one-to-one correspondence between the KDE models 
and the wearable sensors. We propose two novel discriminant 
evaluation functions as follows: 

For the specific activity 0,,5 € {1,---,M}, the value of 
Sum of Statistical Difference (SSD) of the 7, 7 = 1,---,N 
the KDE model is calculated as follows: 


SSDo,(j) = [W(fo.3) — U(fiy)] +--+ + [W(fo.3) — Ufoua)| 


=(M-1)-W(fo,;)- S> V(fo.;)- 


i=1,ixs 


(3) 
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Figure 3. Selection of the KDE Model for the Specific Activity Based on the Principle of SSD Function. 


In Eq.(3), 6s is one of the specific activity among the M 
considered activities; 7 is the sensor readings of the 7 sensor; 
W(-) calculates the statistical characteristic value of the derived 
distribution of the KDE model fg,;. In this paper, U(-) = 
Mean(-), that is the average value of sensor readings. 

The principle of selecting KDE model based on SSD is 
quite simple: for the specific activity 6,, if the SSD value 
of the 7, 7 = 1,---,N sensor is large, it means that 
this 7 KDE model of 0, has a better discriminative ability. 
Here, a simple illustrative example was extracted from the 
OPPORTUNITY dataset experiment in Section IV to show 
the principle of SSD. As we can see in Fig.3, for the specific 
activity 9), the value of SSDo,(g1) = (Mean(fo,g,) — 
Mean( foo.) + (Mean fon.) — Meant fogg,))(Fig-3(@)) is 
larger that SS'Do,(g2) = (Mean( fog.) — Mean(fosgs)) + 
(Mean( fo,9.) — Mean(fosg))(Fig-3(b)). Here, gi and go 
represent the g; sensor and the gz sensor. It can be clearly seen 
in Fig.3 that KDE model (f@,,,) has the higher discriminative 
ability than KDE model (fo,,4,) for activity 01. 

In order to measure the distances between probability den- 
sity functions of each pair of KDEs models, another well- 
known choice for such measurement is Kullback-Leibler (KL) 
divergence defined by, see [23]: 


Divx (foillfos) = >. fos (log Bt. . 


Here f,, and f,, are two discrete probability density func- 
tions. Similar to Div%z,, another well-known divergence is 
Jensen-Shannon (JS) divergence defined by: 


. ee ; 
Divys(fo.ll fp.) = 5 Divxx(forll fo.) + Dive (foal for)]. 
(5) 
Based on Eq.(4) and Eq.(5), another discriminative evalua- 
tion function is given to measure the discriminative ability 
between different KDE models, which is named as Sum 
of Divergence Difference (SDD): For the specific activity 


65,5 € {1,--- , M}, the value of SDD of the 7, 7 =1,---,N 
KDE model is calculated as follows: 
M-1 
SDDo,(j) = >> T(fo.5, fos): (6) 
i,iXés 


In Eq.(6), Os is the specific class of daily activity; Y(-) repre- 
sents the divergence function. In this paper, Y(-) is defined as 
KL (Eq.(4)) or JS (Eq.(5)). It is worth noting that in order 
to make the statements more clear in the following sections, 
we will directly use the Mean(-) to represent that SSD 
criterion is applied for selecting KDE models in the process of 
activity recognition. Similarity, Dive r(-) or Divszs(-) mean 
that SDD is applied and Divi (fp, ||fp2) or Divas (fp, || fp2) 
is used in SDD criterion to measure the difference between 
two distributions. For each activity 0,62,--- ,@.s, the best 
discriminative 1M KDE models fg,,2 = 1,---,M can be 
selected and denoted as follows: 


Soi: Soog: fous 
Toza: ia (7) 
es fosgar nl 


and gi,92,-::,gm € [1,N]. Each of g;, i € {1,---,M} 
represents the selected wearable sensor number. 


D. Testing Stage 


When the test sample becomes available, the corresponding 
BBA is caluclated through each KDE model of each activity. 
Finally, we combine all related BBAs with PCR6 rule and we 
make the final decisions from the combined BBAs. 

1) BBAs Calculation: In this paper, the considered frame of 
discernment is O = {61,02,--- ,@,2}. Each focal element in 
© represents one kind of activity and here we just consider a 
simplified IDF os Sedan = {61, Ao, De ,Ou, 0; U6, U-:- Um}. 
We consider a testing vector x with unknown class and we 
want to identify the label of x corresponding to the activity it 
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belongs to. Next, we use the following equations to calculate 


the BBAs (mi(-), ma(-), hes ,myr(-)): 
m1 (61) =for9:(U(91)),°++ 571 (Om) = forrg: (@(91)); 
m2(61) =fo,92(%(92)), +++ ,™M2(Om) = forrgs((G2)); 

mm (61) = foro (x(gm)), ty ,mm (Ou) = fougu (x(gm)). 


It is worth noting that when the value of one feature is 
missing, we directly assign ”1” to m(6@1 U 02 U--: U @m) 
which means in this case, we cannot obtain the valuable 
decision information. Besides, in order to make sure that 
the derived BBAs satisfy the normalization condition, the 
following normalization applies: 
e If m;(01) + “+ -+mi(Oa1) <1, then mi(O4 U:- 
1 — (m;(61) +--+ + mji(@n)); 
~+ mi(Ou : 
for k = 1,. 


eee 


U 


> 1, then m;(6,) 
.,M, and mi (04 Uss: 


2) Global Fusion with PCR6 and Decision Making: After 
obtaining the 1Z BBAs, the PCR6 fusion rule is used to fuse 
all these BBAs which is denoted symbolically by 


(8) 


Then the final decision of the predicted class of x can be made 
as 0* = argmaxo,m fusion(9i), where 6; is a focal element 
of the D® ...;cteq based on the max of belief mass. 

The DSmT-Based Activity Recognition technique is de- 
scribed in Algorithm | for convenience. 


M fusion = PCR6(m1, m2, wane mm). 


IV. PERFORMANCE EVALUATION 
A. Datasets 


The performance of the proposed DSmT-Based HAR was 
evaluated on the following two open HAR datasets. The first 
one is UCI OPPORTUNITY dataset [13], [14]. The details 
of this dataset can be found in OPPORTUNITY UCI dataset. 
Three basic activities were classified: Walking, Sitting and 
Lying; The other one is UCI DSAD?. The details of the 
DSAD can be found in [24]. In this dataset, five common 
daily activities including Sitting, Standing, Lying, Walking 
and Running were classified to prove the effectiveness of our 
proposed method. 


B. Measures of Performance 

As measures of the performance of our activity recognition 
system, the classical Accuracy, Precision, Recall, and Fl-score 
[7] have been used. They are defined by 


TP.+TN: 
Accuracy = = pS TP, (9) 


a TN, + FP, + FN,’ 


'http://archive.ics.uci.edu/ml/datasets/OPPORTU 
*http://archive.ics.uci.edu/ml/datasets/Daily+and+Sports+Activities. 


NITY+Activity+Recognition. 


Algorithm 1: DSmT-Based HAR 


Input: Sequential crete data 
xi,t=1,---,M,j=1,---,N, K= ’ Normal’. 
Output: The Predicted Class of Unknown data x*. 


1 Initialize: Cross Validation (xij) Xtraining, Xtesting; 
2 Training Stage: 
3 fori =1,---,M do 
4 for 7 =1,---,N do 
a—atd 

5 | fis (is) = Ge Da K(—*): 
6 end 
7 end 
8 fori =1,---,M do 
9 for 7 =1,---,N do 
9 | | SSDo, (9) = (M1): W(fous) — Dea age UF): 
rT or 
12 SDDo, (3) = Ditige V (S005, S013) 
13 end 
14 gi = max(SSDo,) or gi = max(SDDz, ); 
1s end 
16 fmatrix = 

Oigis F691) f6v903° 7 > Soxrg23*** 3 forgone me » foman 
17 Testing Stage: 
18 Drestricted = = {91, 62, oo Om, 61U02U++-U Out}; 
19 for 2 = --,M do 
20 mili) = = “Fora (a (x*(gi)),--+ ,mi(Om) = 


d Song; (& (x “(gi))s 


if mi (61) pteeet mi(Oar) <1 then 

23 | es UO2U--- U@m) =1- (mi(61) +... +mi(Oar)); 

en 

else if mi (61) tere t mi (Om) > 1 then 

26 | Normalization of BBAs m;(@1),--- ,mi(@a2); 

end 

Fusion Step: MFusion = PCR6(m1(-), pees ,ma(-))5 

Decision Step: Take as decision the maximum of belief 
mass of focal elements 07 = argmazxe,™M fusion (6); 

final ; 

return Predicted Class of x*; 


28 
29 


n 


1 TP, 
Precision = — ———<—_ 1 
recision = — a TP, + FP,’ (10) 
1< TP 
Recall = —___ 11 
is “ps TP, + FNg ac) 
ieee SION, - ll 
Fi Score] = S (2 ee 19) 
ne precision, + recall, 


where k denotes class index and n is the number of classes. 
True Negatives (I’'P;,): the number of correctly recognized 
class examples; True Negatives (7’N;,): the number of cor- 
rectly recognized examples that do not belong to the class; 
False Positives (F'P;,): examples that were either incorrectly 
assigned to the class; False Negatives (F'N;,): not recognized 
as class examples. 


C. Results on UCI OPPORTUNITY dataset 

1) Effectiveness of the Selection of U(-) and T(-) in Eq.(3) 
and Eq.(6): The selections of Y(-) in SSD and Y(-) in SDD 
were quite crucial to the representative KDE models for all 
involved activities. Thus, the relevant comparisons about the 
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Table I 
THE SELECTED SENSORS IN OPPORTUNITY DATASET BASED ON Mean(-), Divx, (-), Divss(-). 


2*Subject SSD : V(-) = Mean(-) SDD: Y1(-) = Diver(-) SDD: Yo(-) = Divyg(-) 
(Ir)2-4 (Ir)5-7 (Ir)8-10 Walking Sitting Lying Walking Sitting Lying Walking Sitting Lying 
Subject 1 LLA-accX RLA-accX = Back-magX =LLA-magX  RKN-accZ = Back-magZ LWR-accY  RKN-accZ — LShoe-accZ 
Subject 2 LLA-accX = RLA-accX ~ LShoe-accZ = LLA-magX HIP-accY Back-magZ RKN-accY RKN-accZ Back-magX 
Subject 3 RH-accY LLA-magX  RShoe-accZ Back-magX  Back-magZ — Back-accZ =RKN-accY  Back-magZ —RShoe-accY 
Subject 4 LWR-accY RH-accY Back-magX Back-magZ ~~ LUA-accY Back-accZ LUA-accY LUA-accY Back-accX 


*According to [25], each triaxial (x,y,z) sensor unit has 3-degree of freedom. And in this Table, all the meanings of the involved sensors are: Left Lower Arm 
(LLA);Right Lower Arm (RLA);Right Knee (RKN);Left Wrist (LWR);Left Shone (LShone);Hips (HIP);Right Hand (RH);Right Shoe (RShoe);Left Upper 
Arm (LUA);Accelerator x axis (accX);Magnetic Z-axis (magZ). More details about OPPORTUNITY Dataset can be referred to [25]. 


recognition rates were given in Fig.4 when W(-) and Y(-) 
were set to (1) U(-) = Mean(-), (2) Ti(-) = Divxr(-), GB) 
Yo(-) = Divss(-), respectively. As we can see in Fig.4, our 
proposed method based on these three discriminative func- 
tions® distinguished three mentioned activities in Opportunity 
dataset (four subjects) very well, which indirectly proved 
the effectiveness of Mean(-), Diver, Divys in measuring 
the difference between the distributions of activities. Besides, 
all the three generated models had the highest recognition 
accuracy on Subject 1. However, the sensors selected by 
each function were quite different, and the corresponding 
involved sensors were listed in Table I. It can be found that 
the sensitivity of sensors to different daily activities varied, 
and was influenced by their locations of deployment. Sensors 
located on the arm such as left lower arm, right hand, left wrist 
were more likely to identify “Walking” but sensors located on 
the Back or shoes had higher recognition rates of Lying” than 
other sensors. This directly indicates that it is not feasible or 
wise to rely on a single sensor deployed in a single location 
to identify various kinds of activities [26]. This is also our 
motivation to use multi-sensor fusion strategy based on DSmT 
to solve activity recognition problems. 

2) Recognition Rate versus Training Percentage: In this 
experiment, we did modify the percentage of training set and 
investigated the relationship between the training percentage 


and the classification accuracy of our proposed method on 0.8 ae | 
OPPORTUNITY dataset. It is worth mentioning that the BH Subject 3 

discriminative function chosen here was SSD (Eq.(3)) and si eo itie 4 ] 
U(-) = Mean(-). Since our experiments were conducted 07 | 


based on ten-fold cross validation method, it is convenient for 
us to test the relationship between recognition rate and training 
percentage. According to the principle of ten-fold cross valida- 
tion, the original datasets were first randomly divided into ten 
equal parts. And then, in the first experiment, we first treated 
10% data as training dataset and the remaining 90% data 
were used as testing dataset; And in the second experiment, 
20% datasets were used for training and the remaining 80% 
for testing, and so on, until the last experiment which we 
used 90% datasets for training and the last 10% datasets for 


3As we introduced in Definition 1 and Definition 2, &(-) means that SSD 
(Eq.(3)) is used to choose the best KDE models and Y;(-), Y2(-) means that 
SDD (Eq.(6)) is applied in our activity recognition model. 


Comparisons Between the Selections of SSD ((-)) and SDD( Y(-)) in OPPORTUNITY dataset. 
I I 
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Figure 4. Effectiveness of the selection of SSD(WY(-)) and SDD(Y(-)) in 
OPPORTUNITY dataset. 


Apcuracy vs. Training Percentage for OPPORTUNITY dataset (SSD: V(-)=Mean(-)). 


Classification Accuracy 


i i i i i i i f 
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Figure 5. Classification accuracy vs. training percentage for the OPPORTU- 
NITY dataset. 


testing. Besides, in order to further observe the performance 
of the proposed method, we divided the original data into 
100 equal parts on the basis of one hundred cross-validation. 
And then one of the equal parts was randomly selected as the 
training datasets (1%) and the remaining (99%) were regarded 
as testing datasets. The average accuracy rates of all these ten 
experiments was shown in Fig.5, which showed that even if 
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Figure 7. Confusion matrices of Four Subjects in OPPORTUNITY dataset. 


there were few training samples, the model proposed in this 
paper still gave higher recognition accuracy. 


3) Comparison Between Base Classifiers and Fused Clas- 
sifiers in OPPORTUNITY dataset: In order to deeply analyze 
the relationship between base classifiers and fused classi- 
fier in our proposed model, the detailed comparisons were 
given in Fig.6. Based on the results presented in Fig.4, 
the discriminative function chosen here was SSD (Eq.(3)) 
and W(-) = Mean(-). In Fig.6, the x-axis represents the 
KDE model corresponding to the selected sensor, the y-axis 
represents the number of correctly classified test samples, 
the value above each histogram represents the classification 
recognition rate corresponding to each KDE model, and the 
solid line at the top of the histogram represents the total 
number of test samples. As we can see from Fig.6: (1) the 
recognition accuracy of the fused model was significantly 
improved compared with that of the base classifier; (2) the 
performance of based classifiers were obviously different. 
Among these mentioned base classifiers, RH-accY in subject 4 
had the lowest rate: 56.9885% and LWR-accY also in subject 
4 had the highest rate: 88.7390%. The main reason for the 
performance difference of the based classifiers is that we 
looked for the relative best KDE model for the specific activity 
based on our proposed SSD or SDD, not the absolute best 
KDE model for all activities. More concretely, in subject 1, 
the specific KDE model corresponding to LLA-accX had the 
best classification only for Walking; the specific KDE model 
corresponding to RLA-accX had the best classification only for 
Sitting and the specific KDE model corresponding to Back- 
magX had the best classification only for Lying. In this way, 
we could effectively guarantee the degree of diversity among 


base classifiers, which is really important for ensemble fusion 
[11]. 


4) Comparisons with State-of-the-art Approaches Based on 
Monte-Carlo Simulation: In this part, we further gave the con- 
fusion matrix (Fig.7) of the four subjects in OPPORTUNITY 
dataset based on our proposed method. It is worth noting that 
in the confusion matrix of subject 2-4, there existed a spe- 
cial label “UNKNOWN?” which was quite different from the 
three mentioned activities: Walking, Sitting and Lying. This 
“UNKNOWN?” label occurred in our DSmT-Based method 
because of the missing value in original sensor readings. When 
the current sensor reading was NULL or missing value, the 
maximum belief mass (’ 1”) was assigned to the focal element 
(©) which meant at current time, we really did not know 
the actual class. Modeling missing or NULL information is 
the feature of our proposed method in this paper, which is 
quite different from the traditional supplementation of NULL 
or missing information by interpolation. In this way, our 
proposed method can reduce the risk of misjudgment without 
guaranteeing any changes to the original data. Besides, we 
repeated 50 experiments and recorded the recognition rates of 
all four subjects in Table II. Among the mentioned classical 
approaches, the performance of k-Nearest Neighbours and 
Nearest Centroid Classifier were heavily affected by the num- 
ber of ’k’-closest samples and the centroid of each class. These 
two principles of classification were difficult to work very well 
when there existed uncertain data in HAR problem. Linear dis- 
criminative analysis and quadratic discriminant analysis based 
on the assumption that the features are normally distributed 
are obviously unsuitable in HAR problems. Extreme learning 
machine has been successfully applied for the task of HAR. 
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Table II 
COMPARISON WITH STATE-OF-THE-ART RESULTS ON UCI OPPORTUNITY DATASET. 


2*Reported Methods 

(Ir)2-5 Subject! 
Extreme Learning Machine [27] 0.7056+0.1123 
Linear Discriminant Analysis [28] 0.7859+0.0246 
Nearest Centroid Classifier [14] 0.8305+0.0312 
K-Nearest Neighbours (k = 5) [14] 0.8995+0.0015 
Quadratic Discriminant Analysis [14] 0.9143+0.0076 
Naive Bayes [12] 0.8742+0.0015 
0.9142+0.0098 


0.9714+0.0014 


Ensemble-Extreme Learning Machine(Majority Voting) [11] 


New Method (HAR DSmT-based) 


Computational Testing Time For Each Individual Sample 8.6545 ms 


Table III 
THE SELECTED SENSORS IN DSAD BASED ON Div s(-). 


2* Subject SDD: Ya(-) = Divis(-) 

(Ir)2-6 Sitting Standing Lying Walking Running 
Person | RAzggyro LAs mag LAgaes LAgmag RaAzace 
Person 2 RL zac RAgmag RL yacc LAgmag Tegyro 
Person 3 Tyace Toriae RAyace RAzmag LAymag 
Person 4 LL zace RL ease RAyace RAxzmag LAsmag 
Person 5 LLemag LLzmag RL yace LAgmag LAzmag 
Person 6 Rlemag RL eae Teaure RAzace Temng 
Person 7 RL yace LL zace Tzmag RAzmag Tae 
Person 8 RL zace LAgace LAzace Trmag LAymag 


And for extreme learning machine, sigmoid activation function 
was utilized and the number of hidden nodes was set to 100. 
However, due to the randomness of the algorithm, the results 
of extreme learning machine were unstable and had a wide 
variability. As we can observe in Table II, our method gave 
the highest activity recognition accuracy in subject-1, subject- 
2 and subject-4, and Ensemble-Extreme Learning Machine 
(Majority Voting) gave the highest recognition accuracy in 
subject 3. In addition to the comparison of classification 
accuracy, we also showed the testing time for each individual 
sample of our proposed method in Table II. Our method was 
running in MATLAB R2018b with a hardware of Intel Quad 
Core 15-4670 CPU at 3.4GHz and 16G RAM. As shown in 
Table II, our proposed method was significantly more efficient 
than other general listed methods. The low recognition delay 
of our method was mainly because in the testing phase, only 
the data of selected sensors in the testing sample participates in 
the BBA calculation. The low-recognition delay also showed 
its potential for the application in online activity recognition 
systems, because such real-time activity recognition often 
requires the predictions are updated 1-5 times/s [2]. 


Accuracy 
Subject2 Subject3 Subject4 1*Average Computational Cost 

0.71260.0687 0.6587+0.0295 = 0.7154£0.1414 13.6175 ms 
0.81470.0274 0.7346+0.0318 — 0.79130.0419 11.0537 ms 
0.87180.0289 0.7647+0.0185 — 0.81850.0152 10.3426 ms 
0.8516+0.0101 0.83830.0291  —0.8516+0.0091 11.6340 ms 
0.85170.0078 0.8562+0.0218  0.82160.0214 13.5754 ms 
0.8401=£0.0053 0.8210+0.0315  —0.8517+0.0091 15.7027 ms 
0.8843+ 0.0144  0.8714+0.0156 —0.8830-0.0144 29.5384 ms 
0.8869+0.0026 0.8439+0.0199 — 0.9262+0.0025 - 

14.2733 ms 7.5581 ms 7.6887 ms 9.5436 ms 


Comparisons Between the Selections of SSD (U(-)) and SDD (Y(-)) in DSAD. 
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Figure 8. Effectiveness of the selection of SSD(Y(-)) and SDD(Y(-)) in 
DSAD. 


D. Results on UCI DSAD 


1) Effectiveness of the Selection of U(-) and T(-) in Eq.(3) 
and Eq.(6): Similar to the discussions in OPPORTUNITY 
dataset, we also gave the performance comparisons between 
the selections of W(-) and Y(-) in DSAD. First, the com- 
parisons of recognition accuracy with different evaluation 
criterion was shown in Fig.8 when W(-) and Y(-) were set 
to (1) W(-) = Mean(-), (2) Til) = Divrr(-), (3) Ta(-) = 
Divyg(-), respectively. Different from the phenomenon in 
Fig.4, our proposed method based on Divx 1(-) and Divzg(-) 
could give higher recognition accuracy in DSAD. Due to 
the robust performance of our proposed method based on 
Y(-) = Divzg(-) in DSAD, in the following experiments, the 
discriminative function Div;gs was applied in Eq.(6). Besides, 
the sensors selected by Divs were also listed in Table III. 
It can be found that the sensitivity of sensors to different 
daily activities varied, and was influenced by their locations of 
deployment and the types of sensors. In Table III, T’ : Torso; 
RA: Right Arm; LA: Left Arm; RL: Right Leg; LL: 


Left Leg; x,y, zacc : x,y,z acclerometers; x,y,zmag : 
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z,Yy,z magnetometers; X,Y, zgyro : X,Y, Z gyroscopes. 


Accuracy vs. Training Percentage for DSAD (SDD: T - )=Div ‘ys +). 


3 

5 —+— Person 1 

g —e~— Person 2 

= —#*— Person 3 

S 0.85 —w—= Person 4 4 
g —s— Person 5 

& —o Person 6 

2 a8 . Person 7 | 
ia : —A+ Person 8 


0.75 4 


0.7 


1 1 1 1 1 1 1 1 
0 10% 20% 30% 40% 50% 60% 70% 80% 90% 
Training Percentage 


Figure 9. Classification accuracy vs. training percentage for DSAD. 


2) Recognition Rate versus Training Percentage: In this 
part, we also varied the percentage of training set and inves- 
tigated the relationship between the training percentage and 
the classification accuracy of our proposed method on DSAD. 
Similar to the experiments in OPPORTUNITY dataset, here 
we also conducted ten independent experiments. The average 
accuracy rates of all ten experiments can be seen in Fig.9. 
From these results, we could also draw the same conclusion 
as from the proposed method, i.e. classification accuracies for 
DSAD could reach a high level, without a large amount of 
training samples. 

3) Comparison Between Base Classifiers and Fused Clas- 
sifiers in DSAD: Similar to the experiments in OPPORTU- 
NITY dataset, we also analyzed the relationship between base 
classifier and fused classifier in DSAD, which was shown in 
Fig.10. As we can see from Fig.10: (1) when the classifica- 
tion difference between base classifiers were quite obvious, 
the final performance of fused model could be substantially 
improved. For example, in person 4, the range of classification 
accuracy of all base classifiers was [RA-yacc: 74.0080%, 
LL-zacc:93.6386%] and the final rate of fused model was 
98.6185%; (2) On the contrary, when the performances be- 
tween base classifiers were close, the performance of final 
fused model was not substantially improved. For example, in 
person 7, all five base classifiers had similar recognition rates: 
86.6024%, 91.9036%, 87.2459%, 91.9036%, 91.9036% and 
the performance of the final fused model was 92.4498%. These 
two groups of phenomena further verified the rationality of the 
modeling strategy proposed in this paper: base KDE model 
was only selected for the specific activity, which did guaranty 
the diversities between base models. 

4) Comparison with State-of-the-art Approaches Based on 
Monte-Carlo Simulation: In this part, we further gave the 
confusion matrix (Fig.11) of the eight persons in DSAD 
based on our proposed method. As we can see in Fig.11, 
our method had a higher recognition rate in identifying the 
activities of all mentioned persons. Besides, we further re- 
peated 50 experiments and compared DSmT-based method 


Table IV 
COMPARISON WITH STATE-OF-THE-ART RESULTS ON UCI DSAD. 


Reported Methods Accuracy Computational Cost 
Artificial Neural Networks [24] 0.743 23.2442 ms 
Bayesian Decision Making [24] 0.758 27.4170 ms 
K-Nearest Neighbours [24] 0.860 20.2664 ms 
Support Vector Machines [24] 0.876 25.9724 ms 
differential Recurrent Neural Networks [29] 0.8956 50.9993 ms 
pFTA-Learn + K-Nearest Neighbors [30] 0.9018 19.4653 ms 
New Method (HAR DSmT-based) 0.9515 17.0964 ms 


with the other traditional method in references in Table IV. All 
parameters involved in the mentioned state-of-the-art models 
were consistent with those mentioned in the literature, which 
were not listed in detail here. For k-Nearest Neighbours, the 
performance of this method changed for different values of k. 
A value of k = 5 gave the best results, therefore the accuracy 
of the k-Nearest Neighbours algorithm was provided for k = 5 
in Table IV. For support vector machine, following the one- 
versus-the-rest method, each type of activity was assumed as 
the first class and the remaining 4 activity types were grouped 
into the second class. The overall accuracy rate of support 
vector machine was calculated as 87.6%. Besides, we also 
conducted performance comparison between our technique 
and differential Recurrent Neural Networks (the related source 
codes for dRNN could be downloaded from [29]). As shown 
in Table IV, our proposed method with DSmT-based fusion 
strategy could achieve even higher accuracy than traditional 
approaches. Although SVM and dRNN were powerful models 
for classification and they were not able to properly combine 
the characteristics of multiple sensors; conversely, DSmT- 
based approach was especially designed to effectively fuse 
these information from multi-sensor readings, which proved to 
be very effective for HAR in BSNs. Besides, we also showed 
the testing time for each individual sample of our proposed 
method in Table IV. Results showed that DSmT-based HAR 
takes shorter time than other classical methods. 


V. CONCLUSION 


In this paper, we addressed the challenge of HAR problem 
in BSNs from the perspective of multi-sensor fusion strategy 
and exploited the unique DSmT-Based fusion strategy. In this 
novel fusion strategy, there were two points worth mentioning: 
1) unlike traditional fusion strategy, not all sensor readings 
were used for modeling and fusing, only the selected rep- 
resentative sensors were finally fused; 2) BBA of each test 
sample was constructed according to KDE models. Besides, 
the vacuous BBA was directly given when test sample had 
incomplete pattern. Extensive performance evaluations on two 
wearable sensor-based HAR datasets (OPPORTUNITY dataset 
and DSAD) demonstrated that the proposed approach out- 
performed start-of-the-art methods in accuracy. In our future 
work, we will explore the performance of the proposed method 
in complex activity recognition. In this work, our proposed 
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Figure 10. Comparisons Between Base Classifiers and Fused Classifiers in DSAD. 


Sitting BZ 0.00 = 0.00 1.00 . Sitting BAT 0.00 4.00 = 34.00 10.00 
5000 


@ Standing | 134.00 EAM) 0.00 0.00 0. «00 


5000 


Standing | 0.00 WASATD 0.00 17.00 0.00 4000 


Lying| 0.00 0.00 900.00 0.00 x 3000 Lying| 3.00 0.00 857.00 0.00 39.00 3000 


2000 2000 


Actural Class 
Actural Class 


Walking | 167.00 299.00 2.00 Walking| 1.00 182.00 0.00 WAJMAZZ 0.00 


1000 1000 


Running| 0.00 0.00 0.00 4.00 BET) Running | 24.00 0.00 152.00 15.00 RRYATD 


0 0 


Sitting Standing Lying WalkingRunning Sitting Standing Lying WalkingRunning 
Predicted Class Predicted Class 
(a) Person 1 (b) Person 2 


6000 6000 


Sitting GUT 0.00 Sitting GU 0.00 0.00 0.00 


5000 5000 


3 Standing | 0.00 008 @ Standing} 1.00 AMM 0.00 0.00 dob 
= Lying] 0.00 0.00 899.00 0. , 3000 Bs Lying 0.00 900.00 0.00 3000 
= Walking| 9.00 0.00 0.00 RXR) 0.00 \| 42999 = Walking 0.00 0.00 Yaya 0.00 || 42209 
1000 1000 
Running| 4.00 0.00 0.00 0.00 Fagg Running 0.00 0.00 0.00 Yaa 
0 0 
Sitting Standing Lying WalkingRunning Sitting Standing Lying WalkingRunning 
Predicted Class Predicted Class 
(e) Person 5 (f) Person 6 


6000 6000 
Sitting GMM) 0.00 0.00 0.00 0.00 Sitting QQ) 0.00 0.00 0.00 
5000 5000 
E standing) 0.00 RTD 0.00 10.00 1.00 | soy, 3 Standing | 0.00 0.00 0.00 1 sony 
8 is) 
E Lying] 53.00 0.00 846.00 0.00 0.00 || 43000 = Lying| 0.00 0.00 899.00 0.00 0.00 |} 3000 
& Walking] 2.00 18.00 0.00 Pygmy 9.00 || 700 = Walking| 0.00 0.00 0.00 FxAMy 3.00 || ]20 
1000 1000 
Running| 4.00 2.00 0.00 0.00 ESEEAM) Running| 3.00 0.00 76.00 0.00 FREIRML 
0 0 
Sitting Standing Lying WalkingRunning Sitting Standing Lying WalkingRunning 
Predicted Class Predicted Class 
(c) Person 3 (d) Person 4 
6000 
Sitting BAIN) 0.00 0.00 ‘1.00 Sitting QQ) 0.00 0.00 0.00 
5000 5000 
gStanding | 0.00 AE 0.00 0.00 aie gStanding | 0.00 RRM 0.00 42.00 0.00 |) agg 
5 5 
BE Lying] 48.00 0.00 852.00 0.00 3000 BS Lying| 12.00 0.00 888.00 0.00 0.00 |) 3000 
= Walking} 0.00 0.00 0.00 FRM 1.00 || 1700? = Walking| 0.00 370.00 0.00 Fygeagy 2000 
1000 1000 
Running | 163.00 422.00 2.00 8.00 FEUBAM) Running| 0.00 0.00 2.00 15.00 RRREMUL 
0 0 
Sitting Standing Lying WalkingRunning Sitting Standing Lying WalkingRunning 
Predicted Class Predicted Class 
(g) Person 7 (h) Person 8 


Figure 11. Confusion matrices of 8 Persons in DSAD. 


DSmT-based model was currently trained and tested offline. 
In our future research works, we will investigate and test 
how such new model can be applied to an online activity 
recognition system in real-time. 
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Abstract—The paper presents a study on the human learning 
process during the classification of stimuli, defined by motion 
and color visual cues and their combination. Because the classi- 
fication dimension and the features that define each category are 
uncertain, we modeled the learning curves using Bayesian infer- 
ence and more precisely the Normalized Conjunctive Consensus 
rule, and also on the base of the more efficient probabilistic 
Proportional Conflict Redistribution rule no.5 (pPCRS5 ) defined 
within Dezert-Smarandache Theory (DSmT) of plausible and 
paradoxical reasoning. Our goal is to study how these rules 
succeed to model consistently both: human individual and group 
behaviour during the learning of the associations between the 
stimuli and the responses in categorization tasks varying by the 
amount of relevant stimulus information. The effect of age on 
this process is also evaluated. 


Keywords: Vision, Human Perception, Classification, Color 
cue, Motion cue, Cues Combination, DSmT, probabilistic 
Proportional Redistribution rule no.5 (pPCR5), Normalized 
Conjunctive Consensus (NCC) rule. 


I. INTRODUCTION 


In everyday activities, humans often have to classify objects 
and events in different categories. The process of classification 
requires the acquisition of the common characteristics of the 
members of a category. Depending on the category structure, 
three different ways are assumed to be employed in classifi- 
cation [1]: rule-based, incremental learning, or memorization 
of all exemplars. Rule-based classification is supposed to 
involve sequential hypothesis testing to uncover the rule of 
categorization. The incremental learning is supposed to be 
related to finding the category boundaries in cases when the 
stimulus categories are defined by more than one feature 
and no simple rule describes the category membership. It 
involves forming associations between a set of features and 
the responses. The third way to find the category assignments 
is by memorizing the associations the responses for each 
combination of stimulus features. 
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When the stimuli for categorization are multidimensional 
and not all features are relevant for their classification, an 
important question is how humans find out the proper stim- 
ulus characteristics for category membership. To answer this 
question, [2] tested whether a normative strategy based on 
probabilistic inference could describe the process of category 
learning. Their modeling data imply that the decision making 
based on Bayesian inference is computationally too demanding 
and that humans use suboptimal strategies in the process of 
categorization. 


In the present study, we used multidimensional visual 
stimuli that were divided into categories by rules of different 
complexity. The change in the rule of classifications changes 
in the amount of irrelevant information. We will compare hu- 
man cue combination performance in arbitrary (unstructured) 
classification task with modeled combination performance, 
based on particular fusion rules. In the presented study we 
will apply and compare the performances of the following 
fusion rules: the Normalized Conjunctive Consensus (NCC), 
and the probabilistic Proportional Conflict Redistribution rule 
no.5 (pPCR5) defined within DSmT to model the human 
process of cue integration. We will focus on how the human 
age influences the process of classification as the experimental 
evidence implies that various brain structures and processes are 
involved in the different categorization tasks [3] and they do 
change differently with ageing [4]. 


This paper is organized as follows. In section II we present 
briefly the principles of the used fusion rules, applied to model 
the human cue integration in a classification task. Section III 
is devoted to the experimental strategy, methods, procedures, 
stimulus, and subjects participating in the experiments. The 
results obtained are described and analysed in Section IV. In 
section V fusion rules performance is presented and in section 
VI the trends are illustrated. Conclusions are made in Section 
VIL. 
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II. FUSION RULES FOR MODELLING VISUAL CUE 
COMBINATION 


Various fusion rules exist in the literature to deal with 
uncertain data. They are based on different mathematical 
models and on different methods for transferring the conflict- 
ing mass onto the meaningful hypotheses about the problem 
under consideration. In this paper, we use the Normalized 
Conjunctive Consensus (NCC) rule and the Probabilistic Pro- 
portional Conflict Redistribution rule no.5 (pPCR5), defined 
within Dezert-Smarandache Theory (DSmT) of Plausible and 
Paradoxical Reasoning [5]. Both these rules are described in 
detail in [6]. 


III. EXPERIMENTS 


Three experiments were performed. 


A. Stimuli 


The stimuli were dynamic patterns that differed by the 
motion direction, the spatial distribution, shape, and the color 
of the moving elements. The moving elements were either 
spheres or cubes. Two conditions were simulated — in one 
of them the elements were positioned on a plane, in the other 
they were randomly positioned in depth. The simulated motion 
could have 4 different directions: to the left, to the right, 
forward, or backward. As a result, eight different moving 
patterns were generated: a movement to the left among a cloud 
of elements, movement to the right among a cloud of elements, 
movement forward among a cloud of elements, movement 
backward among a cloud of elements, horizontal translation to 
the left, horizontal translation to the right, movement forward 
towards a plane, and movement backward from a plane. The 
moving elements 4 spheres or cubes, could have one of 4 
colors: red, blue, green, or yellow. Of all possible combinations 
of movements, shape, and color of elements (64 in total: 8 
movements x 4 colors x 2 shapes of moving elements) we 
randomly selected 16 combinations. The characteristics of the 
chosen stimuli are given in Table I. 


Table I 
CHARACTERISTICS OF THE STIMULI USED IN THE STUDY. 


Number of | Disposition Motion Color | Shape of 
[Smut | ofetemens [dein | “| cement 
1 forward 
backward 
backward 
backward 
right 
right 
left 


left 
forward 


forward 
forward 
backward 
right 
right 
left 
left 


B. Experimental conditions 


Three experiments were performed. They differed by the 
classification rule used to separate the stimuli into two cat- 
egories. In Experiment | the stimuli were divided arbitrarily 
by the movement type that resulted from the disposition of 
the elements and the direction of motion, whereas the shape 
and the color of the elements were irrelevant. In Experiment 
2, the stimuli were divided randomly into two categories 
based on their color, whereas the elements’ spatial disposition, 
motion direction, and the shape of elements were irrelevant. 
In Experiment 3, the stimuli were randomly divided into two 
groups based on the combination of the motion direction, 
elementsa disposition, and color. As in Experiments 1 and 
2, the shape of the elements was irrelevant. Table II presents 
the separation of the elements in two categories for the three 
experiments. 


Table II 
CHARACTERISTICS OF THE STIMULI USED FOR DIVIDING THE STIMULI IN 
CATEGORIES IN EXPERIMENTS 1-3. 


Experiment 3 
cloud, 


cloud, cloud, red blue cloud, cloud, 
right left right, left, 
yellow | green wall, cloud, 
forward left, left, 
red green 


cloud, 


green blue 
backward 
wall, cloud, 


wall, 


right left 


wall, 
forward 


wall, 
backward forward, 

green 

cloud, 

‘ backward, 
red 
cloud, 
backward, 
green 
cloud, 
backward, 
yellow 
wall, 
backward, 
blue 


forward, 
yellow 


As is evident, the classification of the stimuli in Experiment 
3 could be done either by trying to find the combination of 
the stimulus characteristics, or as a rule-with-exception as all 
cloud stimuli except | are in Category 2, and all wall stimuli 
except one are in Category |. 


C. Experimental Procedure 


Before each experiment, the calibration of an eye-movement 
recording device was performed. In addition to the standard 
calibration, each experimental session started with a sequential 
presentation of a dot at different positions (center, left, left 
corner, up, right corner, right, down) 10 degrees from the 
screen center. The dot changed position after 1.5 sec. 

Each experiment started with the sequential presentation of 
all stimuli. At the end of the stimulus sequence, all stimuli 
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were presented again in a different order and the Subject has 
to describe the stimulus characteristics 4 shape and color of 
elements, the direction of motion, and disposition of elements. 
This preliminary session aims to acquaint the subjects with the 
stimulus set and the stimulus characteristics. The experimental 
session consisted of 128 stimuli — 8 repetitions of each 
stimulus in random order. 

Before the presentation of each stimulus, a warning signal 
is given. The stimulus duration was | sec. During stimulus 
presentation, a fixation dot with a diameter of 0.5 deg. of 
arc was shown in the middle of the screen during stimulus 
presentation. Five hundred milliseconds after stimulus disap- 
pearance, two figures 4 a triangle and a star (with a size of 
approximately 4 x 4 deg. of arc) appeared at 10 degrees to 
the left or right from the fixation point. On every trial, the 
position of these figures was randomly selected. The Subjectas 
task was to select to which stimuli corresponds the star, and 
to which 4 the triangle. They were required to keep fixation at 
the fixation point during stimulus presentation and to make a 
saccade to the selected figure. They had to press the left mouse 
button if the selected figure is to the left of the fixation point, 
and the right mouse button 4 if the selected figure was to the 
right. In the case of correct choice, a high tone was played, 
whereas in case of incorrect choice a low tone was played. 
The subjects were told that at the beginning they could only 
guess, but during the progression of the experiment, by trial- 
and-error, they would be able to find the proper classification 
of the stimuli in categories. 


D. Method 


The order of the experiments was contra-balanced between 
the subjects. The number of the experimental sessions de- 
pended on the Subjectaés performance — if the number of 
correct responses was low, the participants started a new 
session after a short break. However, even if the performance 
of a subject was still not good, no subject participated in 
more than three experimental sessions. The experiments were 
separated by 3 to 7 days to avoid inference from previously 
learned categorization. 


E. Apparatus 


The stimuli were presented on a black background with a 
custom program written in Python with OpenGl. They were 
presented on the computer screen operated in refresh rate 
60 Hz and resolution 1280 x 1024 pixels, 214 Dell Trinitron 
with Nvidia Quadro 900XGL graphic board. The stimulus 
observation was binocular from a distance of 57 cm. 

The eye movements of the participants were recorded by 
Jazz novo eye tracking system (Ober Consulting Sp. Z 0.0.) 


[7]. 


F. Subjects 


17 young subjects, aged 18-38 years (median = 22 years) 
and 17 elderly subjects, aged 63-75 years (median = 67 years) 
took part in the study. 


IV. PERFORMANCE EVALUATION OF AGE-RELATED 
OBSERVERS GROUPS 


The experimental goal of our study is directed to character- 
ize human decision making in a classification task influenced 
by: 

e motion information only, 

e color information only, 

e combined motion and color information. 

As the stimuli were randomly assigned to different cate- 
gories based on their visual characteristics, the test Subjects 
have to find the correct association between the stimuli and 
the outcome by trial-and-error. While in the classification 
of objects or events in categories a generalization of their 
characteristics is needed, in arbitrary categorization a specific 
representation of the stimuli is required. As the stimuli in all 
experiments were the same, one possibility is that irrespective 
of the categorization rule, the participants will represent them 
in working memory by all cues. In this case, their performance 
will be similar in all conditions and the memory load will 
be equivalent. Also, if unstructured categorization is based 
on procedural memory [8], the number of features used 
to categorize the stimuli would be irrelevant. However, the 
experiments in our study could be also characterized as rule- 
based with rules of varying complexity that change the amount 
of irrelevant information. A more efficient way to classify 
the stimuli is to represent them in memory only by the cues 
determined by the categorization rule ignoring the irrelevant 
stimulus characteristics. 

An example of the performance from the experiments of 
an occasional test person is shown in figure 1. It represents 
the proportion of correct responses in blocks of 16 trials. 
This information is processed and analyzed to get conclusions 
about the characteristics used for classification in different 
categories. 


Observations of old person No = 5 


0.9) —— cat - Motion 
— cat2 - Color 
—* cat3 - Combined }: 


Part of successful answers 


2 4 6 8 10 12 14 16 18 20 22 24 
Number of trials 


Figure 1. Observations from an occasional test person. 


The question is whether the people rely and base their 
responses on a single source of information, or on a combined 
one, and also which type of information utilized is more 
informative in the decision process and corresponds best to 
the rule used for separating the stimuli in categories. The 
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participants belong to two age groups: Young and Old. Hence, 
also the influence of human age on the assessment of the 
decision will be evaluated. The evaluation is made on the base 
of experimental learning curves, obtained for all different ex- 
perimental categories and for each subject in all age-contingent 
groups. The learning curve represents the change in the correct 
responses with some measure of the experience gained i.e. the 
number of trials. 


Raw observations for combined (red is averaged) 
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Figure 2. Raw observations (blue), averaged over subjects (red). 
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Figure 3. Experimental learning curves for the 3 categorization rules. 


Figure 2 presents the learning curves of all young subjects 
in the case when the stimuli were divided arbitrarily into two 
groups based on the combination of the color and motion of 
the stimulus elements. The figure clearly shows the large in- 
dividual differences in task performance. It also demonstrates 
that with the increase of blocks, the performance of the group 
improves. Figure 3 represents the averaged learning curves for 
all subjects in the young group. It implies that the performance 
gradually increases and the rate of increase is different for the 
different categorization rules. 


A. Evaluation of the perception in Young observers group 


The comparison of the performance in the motion and 
color conditions show that in the Young group only 6 out 


of 17 observers have the best performance for the motion 
condition, 9 observers effectively utilized the color information 
showing the best performance in this case, and only | observer 
shows best performance in the combined condition (for two 
other observers the performance in the combined condition 
is equivalent to that in a single-cue case). For one observer 
the learning performance is equivalent to both single-cue 
conditions. The cumulative curve representing the distribution 
of the average correct responses (on the base of 17 subjects 
in the group) of the young subjects is shown in Fig. 4. 


Learning Curves of Averaged Young Subject 
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Figure 4. Learning curves of the averaged young subject. 


B. Evaluation of the perception in Old observers group 


The comparison of the performance in the motion and color 
conditions shows that in the Old group 9 out of 17 observers 
have the best performance for the motion condition, 6 ob- 
servers show better performance using the color information. 
For 2 out of 17 observers the learning curves for both motion 
and color information could be considered as equivalent. The 
performance of averaged Old test person on the base of 17 
subjects in the group is shown in Fig. 5. Here again, the best 
performance of the single-cue category is confirmed though 
the difference between the three conditions is not significant. 

It can be summarized that the participants in each group 
learn best the association between the stimuli and the response 
when the categorization rule is based on a single cue. This 
implies that the memory representation of the stimulus char- 
acteristics is determined by the categorization rule and the 
participants are able to ignore the stimulus features irrelevant 
to the category membership. 

It is interesting to compare young and old test persons for 
the same conditions. We put together the learning curves of 
young and old subjects in figure 6 for the categories Motion 
and Color, and category Combined in figure 7. 
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Learning Curves of Averaged Old Subject 
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Figure 5. Learning curves of an averaged old subject. 


Comparison Young and Old 


T T T r 


Motion-Y 
Motion-O 


S 


a 
wn 


o 
p 


ad 
bo 


Proportion of correct responses 


i 
Ls) 


14 
Block of trials 


Figure 6. Learning curves of Young and Old for Color and Motion categories. 
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Figure 7. Learning curves of Young and Old for Combined category. 


V. PPCRS5 AND NCC RULES PERFORMANCE FOR 
PREDICTING HUMAN’S WAY OF MOTION AND COLOR 
COMBINATION IN DECISION MAKING 


The main question here is which fusion rule - pPCR5 or 
NCC used to combine available motion and color information 


predicts more adequately human cue integration in deciding 
the stimulus category? 
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Figure 8. Learning curves for experimental categories and mathematically 
obtained NCC result for the averaged young subject. 
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Figure 9. Learning curves for experimental categories and mathematically 
obtained NCC result for the averaged old subject. 


In order to answer this question, we need to make a 
comparison between experimentally obtained and predicted 
(via pPCR5 and NCC rules) learning curves for combined 
categories (motion and color), for the two age groups. 

In Figures 8 and 9, the results of mathematical modeling by 
NCC is shown, and on figures 10 and 11 results of applying 
pPCRS are presented. 

In Figures 12 and 13 the comparison of the empirical and 
both mathematical methods are given. 

The results of the mathematical modeling based on both 
rules predict better performance than observed in the experi- 
mental data. This conclusion concerns the averaged learning 
curves for the two groups. However, due to the large individual 
differences in each group, the average learning curve might 
not be representative of group performance. In section VI we 
present a different approach to describe the learning in the two 

groups and apply the same mathematical modeling to it. 
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Young:Learning curves-experimental and PCR5-combination 
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Figure 10. Learning curves for experimental categories and mathematically 
obtained PCRS for result for the averaged young subject 
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Figure 11. Learning curves for experimental categories and mathematically 
obtained PCRS result for the averaged old subject. 


Young: Learning curves for Empirical and Modeled cases 
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Figure 12. 
for young. 


Learning curves for empirical and mathematically modelled cases 


Another comparison between the methods is provided on 
the base of the goodness-of-fit test [9], that is an important 
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Figure 13. Learning curves for empirical and mathematically modelled cases 
for old. 


application of chi-squared criteria: 


where y? is an index of the agreement between an 
observed(O)/experimental and expected(E)/predicted via par- 
ticular fusion rule sample values of the learning curve. For our 
case, J = 24 represents the number of independent observa- 
tions. The critical value of the test for v = J — 1 = 23 degrees 
of freedom at the assumed probability p = 0.1 is y? = 32.0 
[9]. 

The respective results are given in Table III - for the young 
group, in Table IV - for the old persons’ group. 

In general, the results show that both fusion rules - NCC 
and pPCRS succeed to predict adequately human performance 
for the two age groups. Only for one subject from table IV, 
the NCC modeling is not adequate — because its NCC error 
is bigger than the defined critical value y? = 32.0. Thus, 
contrary to the case of the average learning curves where the 
NCC and pPCR5 predict better performance than obtained 
experimentally, both rules can describe well the individual 
learning curves in both age groups. 

The results for young and old test-persons are presented in 
Tables III and IV respectively!. 


VI. COMMON TRENDS OF AGE-RELATED OBSERVER 
GROUPS 


The goal here is to find the common trend, concerning the 
performance of the two groups. For this purpose, we consider 
each group as a set of different sources of evidence, associated 
with each person in the group. That way the young group 
consists of 17 (young subjects) sources of evidence, which 
should be combined all together via pPCR5 and NCC fusion 
rules. 


leek means missing information from the test person. 
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Table III 
CHI-SQUARED VALUES FOR YOUNG SUBJECTS. 


(Motion and Color) pPCR5 | (Motion and Color) NCC 
25.4366 


0.0747 
0.8255 
0.5054 
28.2272 
1.2497 
23.9470 


0.3935 


8.5851 
0.5281 
16.0195 
14.5225 

1.0158 
11.4884 
22.0764 
17.7547 
28.0757 


Table IV 
CHI-SQUARED VALUES FOR OLD SUBJECTS. 


(Motion and Color) pPCR5 | (Motion and Color) NCC 
0.3587 
Seo 


0.7524 
13.1982 
6.2634 
8.0005 
0.9172 
0.0441 
10.0382 
0.0471 
11.3722 
2.3003 
2.4354 
24.7161 
ae ok 
0.8704 
6.5457 


15.2725 
38.7541 
31.1429 
9.5761 
seek ok ok ok 
9.2390 
21.1361 


The combined individual behaviours in a particular group 
are estimated, revealing its intrinsic behaviour as a whole, re- 
ducing uncertainties associated with individual performances. 
All the tested subjects in age groups are considered as indepen- 
dent and equally reliable sources of information because each 
subject provides his/her learning curve, associated with the 
motion and color condition and should be taken into account 
with equal weights to derive these trends. Our goal is to find 
out which combinational rule (pPCR5 or NCC) is able to 
model correctly and adequately such human age-contingent 
group trends in the process of decision making. The results 
obtained for experimental and estimated (via the fusion rules) 
trends, concerning the cues combination groupsa performance, 
are presented in Figures 14 and 15. It can be seen that the 
learning curves obtained by the pPCRS fusion rule in the two 
figures — for young and old test persons, are more close to the 
experimentally defined target curve. 

pPCRS fusion rule predicts more correctly the human model 
of decision making, than the NCC rule, utilizing all the 
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Figure 14. Trends for Young Subjects Group. 
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Figure 15. Trends for Old Subjects Group. 


available information (Motion and Color), even in case of 
conflict. NCC based trends are very sensitive to the sources 
(different subjectsa learning curves) with the bigger means, 
neglecting that way part of the available information and acting 
as an amplifier of the information by reducing the variances 


VII. DISCUSSION 


This paper presented a study on the human classification 
of stimuli defined by motion and color visual cues and their 
combination. The influence of human age on this process was 
evaluated. The results obtained show age-related differences 
in the performance of the subjects in estimating the human 
classification based on both single- and multi-cue classification 
rules. 

Our experimentally obtained data for young observers sug- 
gest a smaller effect of the motion information, while for 
the older observers the color information has less effect. 
Hence, the learning performance differs depending on the 
categorization rule and the age of the participants. All age- 
related groups have difficulties to divide the stimuli in groups 
based on combined (motion and color) information. This 
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finding implies that this condition presents a greater memory 
load to all observers than the single cue conditions. 

In the classification task of multi-cue stimuli, there is 
uncertainty about which dimension and which feature along 
this dimension determine the correct response. Hence, the 
observers have to determine not only the classification rule 
that specifies the categorization dimension, but also which 
exemplars that differ by this dimension fall in one or the other 
category. In contrast to the previous studies [2, 10, 11] testing 
Bayesian inference in classification studies and learning, we 
do not model the explicit performance of each subject or group 
based on the available cues. Instead, our approach reminds the 
analysis of cue combination in perception studies where the 
proportion of correct responses is related to stimulus strength. 
Thus, in our analyses, the experience gained during the task 
performance is regarded similarly to stimulus characteristics 
in detection or discrimination perceptual tasks. We performed 
a comparison between experimentally obtained and predicted 
(via pPCR5 and NCC rules) learning curves for combined 
condition (motion and color), for the two age groups and 
applied the goodness-of-fit test, one important application of 
chi-squared criteria, to evaluate the correspondence of the 
experimental and the model data. The results suggest that the 
predictions of the models outperform human performance for 
both age groups. This finding differs from our previous results 
[6] on cue combination in evaluating the heading direction 
from texture and motion cues. However, it coincides with the 
conclusion in [2] that human subjects perform suboptimally 
in categorization tasks. 

Both the NCC and the pPCRS rules predict well the indi- 
vidual learning curves with a slight advantage of the pPCR5 
rule as it fits well all the learning curves. This finding provides 
evidence for the relevance of our approach for analysis of the 
learning curves. 

We evaluated the common trend in the performance of the 
two age groups by considering each group member as an 
independent source of information. The obtained trends are 
better described by the pPCR5 rule than by the NCC rule. 
The best fit of the group behavior by the PCRS rule is due to 
its properties to utilize all the available information even in a 
case of conflict between the individual learning curves. It is 
an appropriate characteristic of the group data as it preserves 
the idiosyncrasies in the performance of each individual and 
hence, represents effectively the process of decision making 
in classification tasks for different age groups. 


VIII. ACKNOWLEDGMENT 


The reported work is a part of and was supported by the 
project DN02/3/2016 “Modelling of voluntary saccadic eye 
movements during decision making” funded by the Bulgarian 
Science Fund. 


REFERENCES 


[1] EG. Ashby, V.V. Valentin, Multiple systems of perceptual category 
learning: theory and cognitive tests, In Handbook of categorization in 
cognitive science, 2nd Ed. (H. Cohen & C. Lefebvre Editors), pp. 157— 
188, Elsevier, New York, 2017. 


438 


[2] R.C. Wilson, Y. Niv, Inferring relevance in a changing world, Frontiers 

in human neuroscience, Vol. 5, 189, 2012. 

[3] D. Shohamy, C.E. Myers, J. Kalanithi, M.A. Gluck, Basal ganglia and 

dopamine contributions to probabilistic category learning, Neuroscience 

and biobehavioral reviews, 32(2), pp. 219-236, 2008. 

[4] P.B. Kalra, J. Gabrieli, A.S. Finn, Evidence of stable individual differ- 

ences in implicit learning, Cognition, 190, pp. 199-211, 2019. 

[5] F. Smarandache, J. Dezert (Editors), Advances and applications of DSmT 

for information fusion, Vols. 1-4, American Research Press, 2004-2015. 

[6] A. Tchamova, J. Dezert, P. Konstantinova, N. Bocheva, B. Genova, 

M. Stefanova, Human Heading Perception Based on Form and Motion 

Combination, 2018 TEEE International Conference on INnovations in 

Intelligent SysTems and Applications (INISTA 2018), 3-5 July 2018. 

[7] http://www.ober-consulting.com/9/lang/1 

[8] C.A. Seger, C.M. Cincotta, The roles of the caudate nucleus in human 

classification learning, J. of Neurosci., Vol. 25, pp. 2941-2951, 2005. 

[9] J. Matre, G. Gilbreath, Statistics for Business and Economics, 3rd 

Edition, 1987. 

[10] A.J. Yu, P. Dayan, Uncertainty, neuromodulation, and attention, Neuron 

46, pp. 681-692, 2005 

[11] S.J. Gershman, J.D. Cohen, Y. Niv, Learning to selectively attend, in 
32nd Annual Conference of the Cognitive Science Society, Portland, 
2010. 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Improvement of Proportional Conflict Redistribution 
Fusion Rules for Levee Characterization 


Théo Dezert*, Jean Dezert? 


“IFSTTAR, GERS, GeoEND, F-44344 Bouguenais, France. 
oThe French Aerospace Lab, ONERA/DTIS, Palaiseau, France. 


Emails: theo.dezert @univ-eiffel.fr, jean.dezert @ onera.fr 


Originally published as: T. Dezert, J. Dezert, Improvement of Proportional Conflict Redistribution Fusion 
Rules for Levee Characterization, in Proc. of ESREL 2021 Int. Conference, Angers, France, September 


19-23, 2021, and reprinted with permission. 


Abstract—Levee security assessment is a complex expert assess- 
ment process based on several heterogeneous data. In our previ- 
ous research works, we applied information fusion techniques to 
characterize flood protection levees. We used the proportional 
conflict redistribution rule no.6 (PCR6) proposed in DSmT 
(Dezert Smarandache Theory) framework to combine data from 
geotechnical and geophysical investigation methods. However, 
in some cases, this rule can generate non satisfactory results. 
Indeed, the uncertainty between several hypotheses (lithological 
materials) is overestimated after the fusion process, which is 
detrimental to decision making in the end. This result occurs 
because the PCR6 rule does not preserve the neutrality of the 
vacuous belief assignment, which can be judged as being a 
counter-intuitive behavior. To overcome this problem we present 
an improved rule that preserves the neutrality of vacuous belief 
assignments in the fusion process. Hence, the redistribution of the 
partial conflict masses using this new rule does not overestimate 
the masses associated with partial uncertainties. To illustrate 
the use of this new fusion rule in a levee characterization 
problematic, we simulate data acquisition. Two geophysical inves- 
tigation campaigns (electrical resistivity tomography and multi- 
channel analysis of surface waves methods) and a geotechnical 
acquisition campaign (core drillings with particle size analysis) 
are numerically simulated on an earthen structure. The objective 
is to compare and discuss the fusion results obtained using this 
new rule with respect to the methodology based on the original 
PCR6 rule as well as to demonstrate the enhancement of the 
levee characterization. 


Keywords: belief functions, levee, cross-disciplinary ap- 
proach, natural hazards, fusion rules, risk management, pro- 
portional conflict redistribution rule. 


I. INTRODUCTION 


This work is part of a problematic of levee characterization 
for flood protection. Indeed, these hydraulic works are mostly 
old and heterogeneous and their rupture can lead to disastrous 
consequences such as human, economic and environmental 
losses. Since many different materials and construction meth- 
ods exist, each flood protection embankment is unique, and the 
nature of its structure goes hand in hand with its environment 
[1]. The structures are more or less subject to breakage in weak 
areas under specific loads. Reducing the risk of levee rupture 
requires an improvement of their diagnosis and therefore to 
enhance their characterization. First, it relies on technical 
surveys able to determine if specific pathologies that could 
lead to failure mechanisms are present in the levee structure. 
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Methodologies for the evaluation of these structures usually 
include geotechnical and geophysical investigation methods. 
Geophysical methods are mainly non-intrusive and provide 
physical information on large volumes of subsoil but with 
potential significant uncertainties. Geotechnical investigation 
methods, on the other hand, are intrusive and provide more 
punctual information spatially, but also more precise. These 
two sets of methods are complementary. Information fusion is 
a helpful technique to combine geotechnical and geophysical 
data in a complex processing for the levee security assessment 
based on several heterogeneous data. The processing of the 
data from geophysical and geotechnical investigation methods 
and their fusion, taking into account their imperfections and 
associated spatial distributions, is an essential issue for the 
evaluation of earthen levees. A cross-disciplinary fusion ap- 
proach for the characterization of lithological materials within 
the structures has recently been proposed in the mathematical 
framework of belief functions [2]. 

In this paper, we present a flawed behavior of PCR6 
combination rule attributed to the non neutrality of the vacuous 
BBA (Basic Belief Assignment), and we propose an improve- 
ment to this rule (PCR6*) in order to ensure the neutrality 
property of the vacuous BBA. This improvement helps in 
reducing the level of uncertainty in fusion results by discarding 
ignorant sources for each partial conflict. To demonstrate the 
pertinence and advantages of PCR6* over PCR6, we compare 
the obtained results for i) a simple numerical example and for 
ii) the fusion of simulated geophysical and geotechnical data 
on an earthen levee. 


II. BELIEF FUNCTIONS 


Based on preliminary works done in [3], [4], Shafer has 
introduced the belief functions (BF) in [5] to model epistemic 
uncertainty, to reason about uncertainty and to combine uncer- 
tain information. The theory of belief functions is also known 
as Dempster-Shafer Theory (DST) in the literature. We assume 
that the answer! of the problem under concern belongs to a 
known (or given) finite discrete frame of discernement (FoD) 
O = {61, 2,...,8n}, with n > 1, and where all elements of 
© are exhaustive and exclusive”. The set of all subsets of © 


‘i.e. the solution, or the decision to take. 
2This is so-called Shafer’s model of FoD [6]. 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


(including empty set @, and ©) is the power-set of © denoted 
by 2°. The number of elements (i.e. the cardinality) of 2° is 
2!°l. In this section we recall the main definitions related with 
BF and introduce briefly the conjunctive and Dempster-Shafer 
rules of combinations. 


A. Main Definitions 


A (normal) basic belief assignment (BBA) associated with 
a given source of evidence is a mapping m(-) : 2° —> [0,1] 
satisfying m() = 0 and )) 4-26 m(A) = 1. The real number 
m(A) is called the mass of A committed by the source of 
evidence. The subset A € 2° is called a focal element (FE) 
of the BBA m(-) if and only if m(A) > 0. The set of all 
the focal elements of a BBA m(-) is denoted Fo(m) = {X € 
2°|m(X) > 0}. The set Fo(m) has at least one focal element, 
and at most 2!°! — 1 focal elements because one has always 
m() = 0 by the definition of a normal BBA - see [5]. Belief 
and plausibility functions are respectively defined from m(-) 


' y 


X€2°|XCA 


Bel(A) = m(X), (1) 


PIA) = m(X), (2) 
XE2°|ANX 0 


where A represents the complement of A in 9, that is A = O— 
{A} ={X|X € © and X ¢ A}. The symbol = means equal 
by definition and the minus symbol denotes the set difference 
operator - see [7], [8]. 

Bel(A) and Pl(A) are usually interpreted respectively as 
lower and upper bounds of an unknown (subjective) probabil- 
ity measure P(A). The width Pl(A) — Bel(A) of the belief 
interval [Bel(A), P1(A)] is usually called the uncertainty on 
A but it represents in fact the imprecision on the probability 
of A granted by the source of evidence. When all the focal 
elements of a BBA m(-) are singletons this BBA is called 
a Bayesian BBA and its corresponding Bel(-) function is 
equal to Pl(-) and they are homogeneous to a (subjective) 
probability measure P(-). The vacuous BBA (VBBA for short) 
representing a totally ignorant source is defined as m,(Q) = 1. 


B. Conjunctive Combination Rule 

We consider S > 2 distinct reliable sources of evidence 
characterized by their BBA m,(-) (s = 1,...,S) defined on 
the same frame of discernment 0%. Their conjunctive fusion, 
denoted Conj(m1, m2, ..., mg), corresponds to a (non proper) 
BBA defined for all A € 2° by 


SS Ip mi%i,), (3) 


where X; = (X;,,Xj,,-.-,Xjs) is a possible S-uple of 
focal elements, where 7; € {1,...,Fi}, jo © {1,..., Fo}, 


.., and jg € {1,...,Fs}. The element X,, is the focal 

3For notation simplicity, we omit © lower index in the notations of sets 
of focal elements Fo(m1), ..., Fe(mg), and their cardinalities are simply 
written as Fi, ..., and Fg. 


element of the BBA m,(-) that makes the i-th component of 
the j-th S-uple X,;. The set F(m,...,mg) is the set of all 
possible S-uples. The cardinality of the set F(mj,,...,mg) is 


noted F for convenience. The total conflicting mass, denoted 
my’ g(0), is given by 


Ss 
> Wecee: (4) 


XjEF(m),..-; mg)t=1 
Xj1N-.0X56=0 


This fusion rule is commutative and associative, and 
the vacuous BBA m, has a neutral impact, that is 
Conj(m1,me2,...,™mg,My) = Conj(m1,me,...,mg). Its 
main drawback is that it does not generate a proper BBA 
because my g(0) > 0 in general. Because the empty set 0 


is the absorbing element for the conjunctive operation, this rule 


generates mo. g(@) that quickly tends to one after only few 


steps of a sequential fusion processing of the sources which is 
not very useful for decision-making support. The main interest 
of this rule is its ability to identify the partial conflicts, and 
to provide a measure of the total level of conflict hea g(0) 
between the sources which can be used to manage (select or 
discard) the sources in the fusion process if one prefers, see 


[2] for instance. 


C. Dempster-Shafer Combination Rule 


Dempster-Shafer (DS) rule of combination is the emblem- 
atic rule of combination proposed by Shafer in his Mathemati- 
cal Theory of Evidence (see [5]) which is based on Dempster’s 
early works (see [3], [4]). DS rule is nothing but the normal- 
ized version of the conjunctive rule. Hence, DS combination 


isting 


Conj 
m (A) 
mre,....(A) = =. (5) 
1-—my 9. s (0) 


DS fusion rule is commutative and associative, and the 
vacuous BBA m, has also a neutral impact for this rule, but 
its justification and behavior have been disputed over the years 
from many counter-examples involving high or low conflicting 
sources (from both theoretical and practical standpoints) as 
reported in [9], [10], [11]. In our applications that are related 
with risk assessment and safety, we do not prefer to use DS 
rule because of its very serious problems. Actually, many 
alternative rules of combination existt, and among them we 
focus on the new interesting rule based on the proportional 
conflict redistribution no. 6 (PCR6) principle (see [6], Vol. 3 
for details) which is presented in the next section. 


III. PCR6 COMBINATION RULE 
A. PCR6 General Formula 
The PCR6 rule of combination has been proposed in [12], 


[13] as an interesting alternative of original PCR rule of 


4see [6], Vol. 2 for a detailed list of many fusion rules. 
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combination no. 5 (PCRS) proposed in [14], [15]. The PCR6 
rule coincides with the PCR5 rule when one combines only 
two sources (i.e. two BBAs defined on the same FoD). The 
difference between PCR5 and PCR6 rules lies in the way the 
proportional conflict redistribution is done as soon as three 
(or more) sources are involved in the fusion. For notation 
convenience, we define 


S 
(Xp XG Tee XG) = [|] a ). (6) 
i=1 


If X;, 1. X;,...9 Xj, =, then we use the more concise 
notation 7; (0) instead of m;(X;,1.X5,.N...0X;,), and 7; (0) 
is called a conflicting mass product. 

The PCR6 fusion of S>2 BBAs is 
mS (0) = 0, and for all A € 2° \ {0} by 


obtained by 


» 
73(0) 


XEXK; i€{1 saiag S}|X5,=X 


where /\ is the logical conjunction?. 

We use this general PCR6 formula because it is more easy 
to implement and to improve than the original formula given 
in [12] and in [13]. The PCR6 rule is quasi-associative and it 
offers a more refined conflict redistribution than DS rule but 
it is more complex, and it does not preserve the neutrality of 
the vacuous BBA. PCR6 is simpler to implement than PCRS. 
When S' > 2, PCR6 is better than PCRS for decision-making 
as shown in [12]. Matlab™ codes of PCR5 and PCR6 fusion 
rules can be found in [6], [16], and also from the BFAS®° 
repository. The PCR5 formula can be obtained from the PCR6 
formula by just replacing the two summation operators on 
i€ {1,...,S}|X,;, = A appearing in (7) by the two product 
operators on i € {1,...,S}|X,, = A, that is 


— 


i€{1 sp S}| XG, =A t€{1,...,S}|Xj,=A 


B. Drawback of PCR6 Rule 


The PCR6 (resp. PCR5) rule of combination is not asso- 
ciative which means that the fusion of the BBAs must be 
done using general formula (7) if one has more than two 
BBAs to combine, which is not very convenient. Therefore, 
the sequential PCR6 (resp. PCR5) combination of S > 2 
BBAs are not in general equal to the global PCR6 (resp. 
PCRS) fusion of the S BBAs altogether because the order of 
the combination of the sources does matter in the sequential 
combination. Moreover, the PCR6 rule (resp. PCR5) can 
become computationally intractable for combining a large 
number of sources and for working with large FoD. This is a 


Sie. x A y means that conditions x and y are both true. 


©Belief Functions and Applications Society, see https://www.bfasociety.org/. 
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well-known limitation of this rule, but this is the price to pay 
to get better results than with DS rule. Aside the complexity of 
this rule, it is worth to mention that the neutral impact property 
of the vacuous BBA m, is lost in general when considering 
the PCR6 (or PCR5) combination of S > 2 BBAs altogether 
because of the proportional conflict redistribution principles 
used in PCR6 (resp. PCR5) rule. The non neutral impact of 
the vacuous BBA is clearly a drawback because it is naturally 
expected that the vacuous BBA must not impact the fusion 
result in the fusion process because the vacuous BBA brings no 
useful information to exploit. Also a BBA that is close to the 
vacuous BBA should not have a strong impact on the fusion 
result because it brings only a very little valuable information. 
This can be seen as a flaw of the behavior of PCR6 (resp. 
PCRS) rule of combination. To emphasize clearly this flaw, 
we give in the example | a case where the mass committed 
to some partial uncertainties can increase more than necessary 
with PCR6 rule if we have a BBA which is close (or equal) 
to the vacuous BBA, which is detrimental for the quality of 
the fusion result and for decision-making (because the result is 
more incertain than it should be, and consequently the decision 
is more difficult to make). 


Example 1: consider 9 = {A,B,C,D,E} and the three 
BBAs listed in Table I. 


TABLE I 
THE THREE BBAS TO COMBINE. 


Focal Elements mi(-)  ma(-) ~~ ms3(-) 
B 0.05 0.05 0 
AUB 0.65 0.05 0 
CUD 0.05 0.50 0 
AUBUCUD 0.15 0.05 0 
E 0.10 0.35 0.01 
()} 0 0 0.99 


Here ms(-) is not equal to the vacuous BBA but it is 
very close to the vacuous BBA because ™m3(0) is close to 
one. The results’ of the fusion PCR6(m 1, mz), and the fusion 
PCR6(m1, m2, m3) are given in Table II. 


TABLE II 
mE (.) AND mE (-) RESULTS. 


Focal Elements mS (-) mS +) 
B 0.054877 0.048939 
AUB 0.406987 0.247656 
CUD 0.312886 0.204005 
AUBUCUD _ 0.024917 0.013439 
E 0.200333 0.101731 
iS) 0 0.384230 


7The numerical values have been rounded to their sixth digit. 
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One sees that combining the BBAs m1, mz with the BBA 
ms (where m3 1s close to vacuous BBA, and therefore m3 
is almost non-informative) generates a big increase of the 
belief of the uncertainty in the resulting BBA. This behavior 
is clearly counter-intuitive because if the source is almost vac- 
uous, only a small degradation of the uncertainty is expected 
and in the limit case when m3 is the vacuous BBA no impact 
of m3 on the fusion result should occur. Because of this flawed 
behavior, we propose in the next section an improvement of 
PCR6 rule (called PCR6* fusion rule) in order to preserve the 
neutrality of the vacuous BBA. 


IV. IMPROVEMENT OF PCR6 RULE 


The very simple and basic idea to improve the PCR6 
conflict redistribution principle is to discard the elements that 
contain the other elements implied in the conflict mass product 
1; (0) calculation. Indeed, the elements discarded are regarded 
as non informative and not useful for making the conflict 
redistribution. To illustrate clearly this point, let’s consider 
again Example | and the conflicting product 


T16(0) = m1(AU B)ma(C'U D)m3(0). 


With PCR6, the redistribution of 716(0) follows 


x16(AU B) _ xi6(C UD) _ x16(9) 
mi(AUB) ~ m2(CUD) ~~ ms3(@) 
™16(0) 


~ mi(AU B) +ma2(C UD) + ms3(@)’ 


which is not very efficient because © is not the source of 

conflict in this case since AUB C O andCUD C0@. 
The conflict exists only because (A UB) (CUD) = 9. 
In the improved version of PCR6 rule, denoted PCR6", the 
conflicting product 71¢6(@) will be redistributed only to AU B 
and to C'U D but not to ©. With PCR6* rule we will make 
the new (simpler) redistribution of 716(@) according to 


x16(AU B) _ zig(C UD) 


eee OD ___. 
~ m(AUB)+me(CU D)' 


A. PCR6* general formula 


The general expression of PCR6* (and also PCR5*) is 
presented in details, with many examples and Matlab™ codes 
in [17]. Here, due to space limitation, we just recall its 
expression for convenience. Actually, PCR6* fusion rule is the 
proper modification of PCR6 formula (7) taking into account 
the selection of focal elements on which the proportional 
redistribution must apply thanks to the value of their keeping- 


index. More precisely, the PCR6* fusion of 5 > 2 BBAs is 


obtained by ee 20) = 0, and for all A € 2® \ {0} by 


ae 
=m ,5,..,9(A) 


‘3 a, 


JE{1,...,F}| ACK Ar; (0) 
> mi(X5,))- 
i€{1,...,S}|Xj,=A 


(0) 


[(«5(A) 


dX (K(X) mi(Xj,)) | em 


XEX; w€{1,..., S}|X5,=X 


where «;(A) and «;(X) are respectively the keeping indexes 
of elements A and X involved in the conflicting product 77; (0), 
that are calculated by the formula 


Kj(Xj,) =1— ial 6;(Xv, Xi). 9) 
Xp XLEX;| Xp AX 
|X5,|<| Xz] 
Xy|<|Xi| 
Xj; = {X1,...,Xs,,5; < S} is the set of all distinct com- 


ponents of the S-uple X, related with the conflicting product 
(0). The term 6; (Xz, Xz) is the binary containing indicator 
of X; with respect to Xj) € %; that characterizes if X) 
contains (includes) Xj in wide sense, or not. More precisely, 
6;(Xy, Xz) is defined by 


1 if Xp CX, 


10 
0 if X) Z Xi. ( ) 


6;(Xv, Xi) 

The value «;(X,,) =1 stipulates that the focal element 
Xj, € Xj; must receive some proportional redistribution from 
the conflicting mass 7;(@), and «;(X,,) =0 indicates that 
Xj, € Xj; will not be involved in the proportional redistri- 
bution of 7; (0). 

Note that «;(O) = 0 if © € ¥; because © always includes 
all other focal elements of V; and © has the highest cardinal- 
ity. For a given FoD and a given number of BBAs to combine, 
it is always possible to calculate off-line the values of the 
keeping-indexes of focal elements for all combinations leading 
to conflicting products 7;(0) > 0. We can verify that formula 
(8) is consistent with PCR6 formula (7) when all keeping 
indexes are equal to one. The fusion rule (8) is commutative 
and non associative, and the vacuous BBA m, has a neutral 
impact in PCR6* rule - see proof in [17]. 


B. Example 1 revisited with PCR6t 


Consider the example 1 with the three BBAs given in 
table I. If we combine the BBAs mj, and mz, we have 
PCR6*(m1,mz2) = PCR6(m1,mz) because these rules co- 
incide when combining two BBAs. If we make the PCR6* 
fusion of the three BBAs altogether we obtain different results 
which is normal, because for S > 2 one has in general 
PCR6*(mi,...,mg) # PCR6(m1,...,mg). For this exam- 
ple we get results shown in Table III. 
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We can verify that the result obtained by PCR6* fusion 
rule is more judicious than with PCR6 rule because the fusion 
of the almost vacuous BBA m3(-) has a very little impact 
in the fusion result as we intuitively expect. This is because 
the PCR6* combination rule discards the ignorant (or almost 


: : ; i + é 
ignorant) information. With mess (-), the largest mass is 


allocated to AU B as with® mike (.), and contrariwise to 


miGS(-) when using the PCR6 fusion rule - see results in 


Table II. 


TABLE III 


micro? (-) AND mPCRS* -) RESULTS. 

Focal Elements mPCROT (-)  mé cRot (-) 
1,2 1,2,3 

B 0.054877 0.054485 
AUB 0.406987 0.407174 
CUD 0.312886 0.312660 
AUBUCUD 0.024917 0.025232 
E 0.200333 0.200449 
() 0 0 


V. APPLICATION TO LEVEE CHARACTERIZATION 


We now present the advantages of the new PCR6? rule for 
an application on a numerical case study representing a levee 
section. To do so, we use the geophysical and geotechnical 
information fusion methodology introduced in [2]. 


A. Model and Information Sources 


The figure 1 displays the structure of the levee, the location 
of the different layers and the representation of the study levee 
section. 


F Borehole position 


=== MASW profile 


== ERT profile 


Electric artifact 


Clays Sands 


20 


(m) 50 100 150 200 250 300 350 400 450 


Fig. 1. a) Levee with position of investigation methods and b) materials in 
the section of interest. 


The area is a lengthwise (parallel to the river) vertical 
section composed of two lithological materials: 1) compact 
clays (C' hypothesis) and 11) soft sands (S hypothesis). The 


+ 
We recall that one always has mi'CR°" (.) = mESS(.). 


sands are present over 6 meters thick on the first 125 meters 
of the section and over 10 meters thick after. Clayey materials 
are positioned below. A small electrically conductive anomaly 
is located near the surface in the center of the model. Thus, 
the FoD is defined such that 0 = {C,5,O}. As required by 
the fusion method, O is an additional hypothesis standing for 
any other material different from the other two known. For this 
case study, two geophysical methods are used: the Electrical 
Resistivity Tomography (ERT) and the Multi-channel Analy- 
sis of Surface Waves (MASW). Two geotechnical boreholes 
providing information on the lithology are also considered in 
this study. 


B. BBA Distribution for Each Source 


1) Electrical Resistivity Tomography: The basic principle 
of DC-resistivity methods consists in injecting an electric 
current of known intensity [A] by means of two "current" 
electrodes and measuring a voltage [V] between two "poten- 
tial" electrodes. Such measurements are acquired for several 
positions of the current and the potential electrodes. Appar- 
ent resistivity values can then be computed and inverted to 
reconstruct a complete section of electrical resistivity [Q.m]. 
From these electrical resistivity data, the fusion methodology 
[2] enables the BBA distribution depicted in Figure 2. The 
ERT characterization is disturbed by the conductive electrical 
artifact. Thus, clays are locally characterized in the center of 
the section while we know that sands are actually present. 
Also, the interface between clays and sands are not correctly 
defined. 


Clays Sands Clays U Sands 


Fig. 2. a) Material with highest mass (from electrical resistivity data) and b) 
their mass values. 


2) Multichannel Analysis of Surface Waves: The MASW 
method consists in studying the surface wave’s dispersion 
(waveform deformation) to determine the shear wave’s veloc- 
ity [m.s~']. A seismic source is generated at various locations 
and geophones are aligned on the ground surface to record the 
seismic waves atrival times. The use of this method comprises 
three stages: (i) the data acquisition, (ii) the determination of 
the Rayleigh dispersion curve, and (iii) the inversion process 
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with the determination of the shear velocities. In this work, the 
seismic acquisition is carried out from z = 212 m to x = 428 
m. From the shear wave velocity data, the associated BBA 
distribution is displayed in Figure 3. The MASW character- 
ization is not disturbed by the electrical artifact. Thus, the 
method characterizes correctly the two lithological materials 
as well as the lithological interface position. However, © is 
characterized in most part of the section (in black, Figure 3.a), 
where no data is available. 


Clays Sands 


Fig. 3. a) Material with highest mass (from shear wave velocity data) and b) 
their mass values. 


3) Core Drillings: Two core drillings with particle size 
analysis are simulated at z= 80 m and x= 350 m from 
the surface to 20 m depth. From the simulated geotechnical 
data, the associated BBA distribution is displayed in Figure 
4. The lithological materials are correctly characterized but an 
important area of uncertainty remains between 6 and 10 m 
depth. Indeed, since two different materials are identified in 
both boreholes at such depths, the section is poorly defined 
between them [2]. 


[0) 


Clays Sands 


Fig. 4. a) Material with highest mass (from two borehole data) and b) their 
mass values. 


C. PCR6 and PCR6* Fusion Results 


The fusion results using PCR6 and PCR6* rules are re- 
spectively depicted in Figures 5.a-b and Figures 5.c-d. These 
results highlight the lack of characterization at the center of the 
model using PCR6 rule (in the red boxes, Figure 5.a). Indeed, 
© is characterized while PCR6* rule enables to correctly 
characterize sands. For PCR6, this area is difficult to define 
since the ERT suggests the presence of clays, the MASW 
suggests the presence of sands and the geotechnical source 
of information is ignorant. However, PCR6* rule manages to 
allocate the conflictual masses on the individual hypothesis 
instead of ©. Furthermore, the global belief mass values are 
greater with PCR6* rule (Figure 5.d) than with PCR6 (Figure 
5.c). This improvement in the results could be valuable in 
the context of an investigation campaign on a real earthen 
structure. Indeed, knowing the nature of the materials as well 
as their location is crucial to achieve a good diagnosis and 
limit the risk of breakage. Since many investigation methods 
can be ignorant or partially ignorant in the context of levee 
characterization, this new combination rule would be of great 
operational interest to give credit to the most informative 
source and to avoid uncharacterized areas inside the earthen 
structure. 


50 100 150 200 250 300 350 400 450 


Clays Sands Clays U Sands e 0 02 04 06 O08 1 


Fig. 5. Material with highest mass (from ERT, MASW and core drillings), 
using PCR6 (a) and PCR6* (c) rules, with area of interest in red box. b) and 
d) mass values associated with the hypothesis depicted in a) and c). 


VI. CONCLUSIONS 


In this work, after having introduced the belief functions as 
well as conjunctive, DS and PCR6 rules of combination, we 
presented the flawed behavior of PCR6 rule. We then described 
improvements to correct these behaviors, introducing a new 
PCR6* rule. The computation of a keeping index, making 
it possible to discard ignorant information sources for the 
calculation of each partial conflict, was detailed. This keeping 
index has been integrated into the original formulation of 
PCR6 in order to ensure the neutrality property of the vacuous 
BBA. The interest of such combination rule has finally been 
demonstrated for an application on a numerical levee section 
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with simulated geophysical and geotechnical acquisitions. As 
a following perspective, we wish to apply this new PCR6t 
rule to risk analysis issues with data fusion acquired from real 
investigation campaigns. 
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Abstract—The objective of this paper is to present a general 
methodology for storm risk assessment and prediction based on 
several physical criteria thanks to the belief functions framework 
to deal with conflicting meteorological information. For this, we 
adapt the Soft ELECTRE TRI (SET) approach to this storm 
context and we show how to use it on outputs of atmospheric 
forecast model, given an estimate of the state of the atmosphere 
in a future time. This work could also serve as a benchmark 
for other methods dealing with multi-criteria decision-making 
(MCDM) support and conflicting information fusion. 
Keywords: storm risk assessment, information fusion, belief 
functions, decision-making, Soft ELECTRE TRI. 


I. INTRODUCTION 


In the context of storm prediction, many sources of obser- 
vations of the atmosphere may be used. The aim of storm 
risk assessment is to exploit as best as possible some of 
these available data to evaluate the risk of thunderstorm at 
a given location in the surveillance area under concern in 
a close future. Each type of data is associated to a given 
source of information called a criterion in our context. In the 
present paper, the data used are coming from a numerical 
weather prediction model. These kinds of models allow to 
simulate the evolution of the state of the atmosphere by solving 
dynamical and thermodynamical equations, by including data 
assimilation of observations of the atmosphere (from satellite, 
rawinsonde or buoys, for instance) and by adding physical 
parametrization for unresolved processes as convection. The 
outputs of the Global Forecast System (GFS), developed by 
the Centers for Environmental Prediction (NCEP) have been 
used for our study [1]. The estimation of storm risk level is of 
prime importance for many applications (aeronautical safety, 
air traffic management, ...). In this work we present a general 
methodology showing how to use belief functions [2] coupled 
with the Soft ELECTRE TRI (SET) outranking method [3] to 
manage efficiently the conflicting sources of information in a 
multi-criteria decision-making context. 

This paper is organized as follows. After a short presentation 
of the Soft ELECTRE TRI outranking method in Section 
II for Multi-Criteria Decision-Making (MCDM) support we 
introduce the storm risk assessment problematic in Section 
Ill, and we show how it can fit well with the Soft ELECTRE 
TRI framework. We also provide an example of our storm 
assessment methodology based on data set coming from the 
atmospheric forecast supplied by GFS, and we show the 
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performances of SET approach with respect to “ground truth” 
obtained by the World Wide Lightning Location Network 
(WWLLN) [4]. Conclusions are given in section IV with some 
perspectives. 


II. Sorr ELECTRE TRI FoR MCDM 


A. A short presentation of SET 


The Soft ELECTRE TRI method (SET) proposed in [3] is 
an evolution of the ELECTRE TRI (ET)! method proposed 
by Roy in [5] for making the outranking of alternatives with 
respect to profiles of categories. The SET method is based on 
belief functions calculus [2] (see appendix) and improves the 
classical ET method because it does not require an arbitrary 
choice of A-cut strategy for making the outranking of alter- 
natives with respect to profiles of categories, nor an ad-hoc 
choice of decisional attitude for making the final assignment. 
Actually, the SET method solves the assignment problem in 
a soft manner. The Fig. 1 shows the general MCDM problem 
that can be addressed by the SET method. More precisely, 
SET solves an assignment problem in complex situations 
where a (or several) given alternative has to be assigned to 
predetermined categories based on multiple criteria values. 
Each criterion G; (j = 1,...,mq@) is evaluated quantitatively. 
Each profile is defined by the green points limiting the bounds 
of each category with respect to each criterion. The red chain 
represents a “multi-criteria value” (i.e. an alternative a) that 
one wants to assign to a predefined category. 


Criteria 


Figure 1: How to assign a category to an alternative? 


| The acronym ELECTRE stands for “ELimination Et Choix Traduisant la 
REalité” (Elimination and Choice Expressing the Reality) [7], [8]. 
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In the context of our storm risk assessment application each 
category C}, corresponds to a level of risk (low, moderate, 
strong, very strong, or extreme), and it is defined by its profile 
lower and upper bounds denoted respectively by by,_; and 
by, (see vertical green plots in Fig. 1). The profiles bounds 
define ad-hoc categories for the values of each criterion G;; 
corresponding to a meteorological parameter which is either a 
direct measure of a meteorological parameter, or an estimation 
of the parameter resulting from a sophisticate meteorological 
forecast model. These meteorological parameters (i.e. criteria) 
will be described in details in section III. Any alternative 
corresponds to a multi-criteria value a = (a1,...,4j,---@ng) 
whose component a; is nothing but the instance of the param- 
eter (i.e. criterion) G; for this alternative a. Each alternative 
a is associated with a (2D) cell of the area of interest on the 
surface of the Earth. 

The SET method allows to take into account the weight of 
importance of each criterion entering this assignment problem 
and to give a soft (i.e. probabilistic) assignment solution 
to commit any multi-criteria value a to a category. More 
precisely, SET calculates the probability that a chosen alter- 
native a belongs to a predetermined category C;, based on all 
information available (the criteria values, the importances of 
the criteria, and the bounds of each predetermined category). 


B. SET principle 


We present briefly the principle and the steps of the Soft 
ELECTRE TRI (SET) outranking method developed originally 
in [3]. SET makes a soft assignment of n, > 1 alternatives 
a; in predefined ordered categories Cy, (h = 1,...,mp) 
according to criteria” measure g;(.), 7 € J = {1,...,nc}. 
Each category Ch, is delimited by the set of its lower and 
upper limits bj, and 6), with respect to each criterion G; 
measured by g;(-). By convention, b3 < bj... < be 
be ceases b,, ..., 04°) is the lower (minimal) profile 
bound and by, = (b7,,,---,04,,5--+, 022) is the upper (max- 
imal) profile bound. The overall profile b;, is defined by 
(g1(bn), 92(bn),---;9ng(bn)), and it is represented by the 
vertical plot joining the green dots in Fig. 1. 

The outranking relations used in SET are based on the cal- 
culation of partial concordance and discordance indices from 
which global concordance and credibility indices are derived 
based on Basic Belief Assignment (BBA) modeling [2], and on 
an advanced fusion technique based on Proportional Conflict 
Redistribution rule no. 6 (PCR6) [9]-[11]. A soft assignment 
of each alternative a; in a predetermined category is obtained 
by the calculation of the probabilized outranking relations, 
from which a final hard assignment can be drawn (if needed) 
for some action. In the storm risk assessment context, an action 
for instance could be the broadcast of an alert message to the 
air traffic management organisms or airports. 

The Soft ELECTRE TRI method requires the following four 
steps: 


?In our context, a criteria is a meteorological parameter. 


SET-Step 1: Calculation of partial (local) concordance indices 
c;(ai, bp), partial discordances indices d,;(a;,b;,), and also 
partial uncertainty indices u; (a;, b;,) between an alternative a; 
and a profile b,, thanks to a smooth sigmoidal model [12]. The 
partial indices are encapsulated in BBAs mz, (.) for alternative 
a; versus profile bp, (i.e. a; vs. b;,) as follows: 


c;(a;, bp) S m,(c) (local concordance) 
d;(a;, bn) + mi, (@) (local discordance) (1) 
uj (aj, bp) = mi, (e Ué) (local uncertainty). 


where m/?, (-) is a Basic Belief Assignment (BBA) defined on 
the frame of discernement © = {c, c}, where c means that the 
alternative a; is concordant (i.e. it agrees) with the assertion 
‘a; is at least as good as profile b;,”, and ¢ means that the 
alternative a; is opposed to this assertion (i.e. it is discordant, 
or it disagrees with this assertion). For each criterion Gj, 
a BBA mz, defined on the power-set of © is obtained by 
the fusion of two simple BBAs m‘(.) and m3(.) based on 
the following sigmoid models, see [12] for justification and 
details. Similarly, we compute also partial (local) concordance 
indices c;(bp, a;), partial discordances indices d;(bp,,a;), and 
partial uncertainty indices u,; (bp, a;) between a profile b;, and 
an alternative a;. This partial indices are encapsulated in BBAs 
m+} ,(.) for profile by, versus alternative a; (i.e. bp, vs. aj). 


SET-Step 2: Calculation of the global (overall) concordance 
indices c(a;,b;,), c(bp,a;), discordance indices d(a;, bz), 
d(b;, a;), and uncertainty indices u(a;,b,), u(b;,, a;) by the 
fusion of local indices. More precisely, one must calculate 


aes m2, ®...em](,) 


mni(.) = [mp ® mj, ®... maf] (,) 


(2) 


where © denotes symbolically a chosen fusion operator. 
To take into account the weight of importance w; € [0, 1] 
of each criterion G;, we propose two fusion methods: 


1) Fusion method 1: we use the weighting averaging (WA) 
fusion rule because it is a very simple rule, and it can 
be processed very quickly. This is of prime importance 
in our storm risk assessment context because one can 
have millions of cells (depending the resolution cell we 
want to work with) in a wide surveillance areas. 

2) Fusion method 2: we use a more sophisticate PCR6 
fusion rule adapted with importance discounting pre- 
sented in details in [13] if more computational power 
is available’. 


Once the BBAs mjy,(.) and mp;(.) are obtained, the global 
indices are defined by 


c(a;, by) & min(c)a(aj, br) 
d(a;, b;,) = min(2)B(ai, br) (3) 
u(ag, bz) & 1- c(a;, bp) a d(aj, b;). 


3Due to the complexity of this fusion rule and computational burden, only 
problems of relatively small dimensions, say for ng < 6 , can be addressed 
by this second method. 
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The discounting factors a(a;,b,;,) and 8(a;,b;,) in (3) are 
defined in [3]. They are not given here due to space limitation 
restraints. c(bp,,a;), d(bp,a;) and u(bp,a;) are similarly 
computed using the dual formula of (3). 

The belief and plausibility of the outranking propositions 
X = “a; > bp” (a; outranks b;,), and Y = “by, > a;” (bp, 
outranks a;) are then given by 


Bel(X) = c(a;, b;,) and Bel(Y) = c(bp, a;) (4) 
PI(X) =1—d(a;,b;,) and PI(Y) =1—d(bp,a;) (5) 


SET-Step 3: Calculation of the probabilized outranking rela- 
tions. In SET-Step 2 we have characterized the outrankings 
X = “a; > bp,” and Y = “bp > a;” by their im- 
precise probabilities P(X) € [Bel(X);Pl(X)] and P(Y) € 
[Bel(Y); P1(Y)]. Solving the outranking problem consists in 
choosing (deciding) if finally X dominates Y (in such case 
we must decide X as being the valid outranking), or if 
Y dominates X (in such case we decide Y as being the 
valid outranking). This hard assignment problem is difficult 
in general because P(X) in [Bel(X);PI(X)] and P(Y) in 
[Bel(Y ); PI(Y)] and these belief intervals can partially over- 
lap. Fortunately, a soft (probabilized) outranking solution is 
possible by computing the probability that X dominates Y 
(or that Y dominates X) by assuming uniform distribution of 
unknown probabilities between their lower and upper bounds. 
To get the probabilized outrankings, we have to calculate 
Pxsy * P(P(X) > P(Y)) and Pysx 4 P(P(Y) > 
P(X)) which are given by the ratio of two polygonal areas, or 
can be estimated using sampling techniques, as explained and 
illustrated in [3]. The probabilities of outrankings are denoted 
Pi, = Pxsy where X & “a; > b,” and Y = “bp, > a,”. 
Reciprocally, we denote P); 4 Pysx = 1- Bp. This 
probabilization of outrankings is directly obtained by this Step 
3 of SET, and thus eliminates the arbitrary A-cut strategy used 
in classical ELECTRE TRI method. 


SET-Step 4: Final soft assignment of a; into a category 
Cy. From the probabilized outrankings obtained in SET-Step 
3, we can make directly the soft assignment of alternatives 
a; to categories C), defined by their profiles b;,. This is 
easily obtained by the combinatorics of all possible sequences 
of outrankings taking into account their probabilities P;p, 
to calculate all the assignment probabilities P(a; — Cy). 
Moreover, this soft assignment mechanism provides also the 
probability 6; = P(a; — @) reflecting the impossibility to 
make a coherent outranking. This soft assignment procedure 
of the SET method does not require an arbitrary choice of 
decisional attitude unlike to what is proposed in the classical 
ET method. A simple detailed example of this SET-Step 4 is 
given in [3] for convenience. 


III. APPLICATION OF SET TO STORM RISK ASSESSMENT 


A. Surveillance zone and data set 


We apply the SET method briefly presented in the previous 
section to storm risk assessment problematic. For this, we con- 
sider in this study five meteorological parameters (i.e. criteria) 


drawn from GFS (Global Forecast System) open data available 
on the web [1]. We have used GFS data for the 9th May 2016 
at 3h UTC. The GFS data used in this study are available 
in [22]. The wide surveillance area covers Atlantic ocean 
from [—1, 70.5] degrees in latitude, and [—100, 10.5] degrees 
in longitude. We have 21592 cells of size 0.5 x 0.5 deg? 
to evaluate. Each cell corresponds to an alternative a; that 
must be assigned to a storm risk category Cy, by the SET 
method. The table I shows the five (ng = 5) meteorological 
parameters (i.e. criteria) used in our study, their units, their 
preference ordering, and their qualitative importance chosen 
for this problematic. 


Criteria Units Preference ordering | Importance weight | 
G1 =PConv | kg/m? | increasing very high 

G2 =LI ok decreasing very high 

G3 = CAPE | J/kg increasing high 

G4 = DivB sl decreasing low 

Gs = DivS sa increasing low 


Table I: Criteria used for SET method. 


where 


e PConv is the 3-h accumulated precipitation induced by 

convective process (in kg/m?) [24]; 

LI is the lifted index which characterizes the instability 

of the atmosphere (in °K). This parameter, developed 

by Galway [25], is the gap between the environmental 
temperature and the temperature of a parcel lifted dry- 

adiabatically to saturation then moist-adiabatically to 500 

hPa; 

e CAPE is the convective available potential energy (in 
J/kg). This parameter is the potential energy available to 
the parcel to lift up beyond the level of free convection. 
As the lifted index, this parameter relies on the difference 
between the environmental temperature and the tempera- 
ture of a parcel lifts adiabatically [26]; 

e DivB is the low-level wind divergence if there is convec- 

tive clouds in the cell (in s~!). This parameter is derived 

from horizontal wind component and the pressure level 

of the bottom of convective cloud [27]; 

DivS is the divergence of the wind above the top of 

the convective clouds (in s~'). Indeed, isolated storm 

cloud is associated with low-level wind convergence and 

divergence near the top of the cloud [27]; 


PConv has an increasing preference order which means that 
bigger the PConv value is, higher is the storm risk. LI has a 
decreasing preference order meaning that lower the LI value is, 
higher is the storm risk. To work with quantitative importance 
weights, we need to transform qualitative labels (low, high, 
very high) into numerical values. For this we use the following 
mapping: Low importance ++ 1, Moderate importance +> 2, 
High importance ++ 3 and Very high importance +> 4. This 
mapping is quite ad-hoc, and could be changed/adapted for 
reflecting a better subjective interpretation of the importance 
level expressed by the expert who provides these qualitative 
importance factors. This mapping is specific of the fusion 
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system design. After the normalization of the numerical im- 
portance factors, we get the following normalized weights 
of importance of each criterion*: w, = 4/12, w2 = 4/12, 
wg = 2/12, wy = 1/12, and ws = 1/12. 

In our study, we consider five (ny = 5) levels of risk 
defined qualitatively as: Low, Moderate, Strong, Very strong 
and Extreme which respectively correspond quantitatively to 
the levels 1, 2, 3, 4 and 5 that will be shown in on the 
followings figures. The profile bounds for each category of 
risk and each criterion are given in Table II. 


Criteria\Bounds of categories | 61 bo b3 ba 
G1 = PConv 0 1.5 7 10 
G2 =LI 0 -2 -6 -10 
G3 = CAPE 0 1000 | 2000 | 4000 
Ga = DivB 0 -0.1 -0.5 -1 
Gs = DivS 0 0.1 0.5 1 


Table I: Bounds of categories of risk. 


For convenience, the figure 2 presents the flow chart of the 
proposed method related to our storm application. 


Input value of i-th cell 
a; = [a1,..-, a5] 


Criteria 


for a; vs. bp for by, vs. aj SET-Step 1 
¢;(a;, by) mj, (e) : ¢j(bn, ai) & mi,(c) local concordances 
dj(ai, bn) = mi), (2) . dj(bn,ai) © mj,,(2) local discordances 
uj(aj, bp) £m}, (eUe) uj(by,a;) = mj (CU2) | local uncertainties 
Fusion of m,,, 7 =1,...,5 Fusion of mj, j = 1,.-..5 SET-Step 2 
Min = Min O... OMe Mrs = Me Bs. BMA 
for a; vs. by, for bp vs. aj 
global concordances 
e(a,, by) Ja(a;, Bp) i 
ah global discordances 
1) — dain) * (w .)| global uncertainties 
beliefs and plausibilities of outrankings X and Y 
Bel(X) = e(a;, by) . [Bel(Y) = e(by,a,) 
PIX) = 1 —d(aj, ba) PUY) =1—d(bp, ai) 
where X = “a; > b,”° where Y = “by, > a,” 
Geometrical calculation of probabilized outrankings 
from intervals [Bel(X), Pl(X)] and [Bel(Y), PU(Y)] | SET-Step 3 
ry ey 
Pin = Pxsy and Py = Pyyx =1— Pin 
Proba of assignmnent of a; to a category Ch 
SET-Step 4 
Calculation of P(a; + Ch) 


SET output values 
Decision C= arg maxc, P(ai > Cn) 
Decision probability P(a; + C) 


Figure 2: Flow chart of the method related to the application. 


The bounds for LI and CAPE criteria are those defined 
by Wesoleck in [23]. The bounds for PConv criteria have 
been deduced from the different thresholds used to distinguish 
light, moderate and heavy rainfall [28] and adapted to our 
geographical area. Heavy rain, correspondings to accumulated 
precipitation above 10-30 mm/h, is associated to severe storm 


4In this work and for simplicity, the importance factor wy, of each criteria 
G, used to make the importance discounting of BBAs in the SET method is 
chosen independently of the profile bound values. 


risk [29]. The bounds for low-level convergence wind and 
high-level divergence wind have been chosen larger than usual 
threshold [30], because the most important information is the 
sign of the divergence merge with the presence of convective 
cloud in the cell. Hence, if the PConv value for the cell under 
analysis is greater than pi" = 20, then the storm risk for 
this cell is considered as extreme (risk=5). If the PConv value 
belongs to (b3~*, b}~'] = (10, 20] then the storm risk for this 
cell is considered as very high (risk=4), etc. If the LI value 
is lower than o = —10 then the storm risk for this cell is 
considered as extreme (risk=5), but if the LI value is between 
ed = —10 and Se = —6 the storm risk for this cell is 
considered only as very high (risk=4). 

The figures 3-7 show the risk levels (1, 2, 3, 4 and 5) 
corresponding to each criterion for the 21592 cells covering 
the Atlantic ocean surveillance area for each criterion consid- 
ered separately. The Dark blue cells with values -1 correspond 
to ground cells which are not taken into account in this 
study. One clearly sees the difference of risks drawn from the 
five meteorological parameters and the conflicting information 
between these five maps of risks of storm that illustrate the 
input data we have to process by the SET method to get the 
global risk assessment. 


Storm risk level based on PConv criterion 
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Figure 3: Storm risk levels based on PConv criterion. 0 means 
no risk; 1, low level of risk; 2, moderate level of risk; 3, strong 
level of risk; 4, very strong level of risk and 5, extreme risk. 
Risk are not calculated over earth, fixed to -1 value (dark blue). 


Storm risk level based on LI criterion 
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Figure 4: Storm risk levels based on LI criterion. 
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Storm risk level based on CAPE criterion 
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Figure 5: Storm risk levels based on CAPE criterion. 


Storm risk level based on DivB criterion 
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Figure 6: Storm risk levels based on DivB criterion. 
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Storm risk level based on DivS criterion 


Latitude (deg) 


Longitude (deg) 


Figure 7: Storm risk levels based on DivS criterion. 


To estimate the variability (randomness) of GFS data in each 
cell, we estimate for each category C), of risk the probability 
P(C;,) by counting the number of criteria associated with C), 
divided by ng = 5. This level of randomness is characterized 
by Shannon’s entropy. Hence, for the cell #i, if we have the 
probability measure p; = (pi,p2,.--;Pn,,), the normalized 
Shannon entropy” is given by 


1 


H(p;) = -—_— 
(pi) ican 


Nh 
Sp logs pn (6) 
h=1 


5One takes pp, logs pp, = 0 if pn = 0. 
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H(p;) = 0 when all meteorological parameters agree with 
the same storm risk category, and H(p;) is maximum if p; 
is the uniform pmf. One defines the mean entropy H of the 
GFS data by averaging the entropy values of the N = 21592 
cells of the surveillance area by 


= he 
B= 5 LAP) (7) 


Figure 8 shows the normalized entropies of the meteorolog- 
ical GFS data we have used in this study. The mean entropy of 
these GFS data is H = 0.2989, and only 32% of the data are 
totally in agreement on the same risk level (shown in green 
color on Fig 8). As we observe on this figure most of the data 
are conflicting because the entropies values are much bigger 
than zero. The high level of randomness of these data justifies 
a sophisticate MCDM method able to deal efficiently with 
conflicting sources of information. This motivates the use of 
SET approach proposed in this work. 


Normalized Shannon entropy of data based on 5 criteria 
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Figure 8: Normalized Shannon entropy of GFS data. 
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B. Results based on the weighted averaging rule 


The figure 9 shows the storm risk levels based on the 
weighted average® of risk levels shown figures 3-7, with the 
weights of importance wy, = 4/12, wo = 4/12, w3 = 2/12, 
wa = 1/12, and ws = 1/12. 


Storm risk level based on weighted averaging fusion rule 
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Figure 9: Storm risk levels based on weighted average of risks. 


®For representation convenience and comparison with SET results, the risk 
values of Fig. 9 have been rounded to their closest integer value. 
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Based on this simple fusion rule one observes that the 
strong (and higher) risks of thunderstorms are located mainly 
in the intertropical convergence zone (around the equatorial 
line), on the Caribbean Sea, and aside the Portugal coast. 
However the method of fusion does not provide a measure 
of the trustfulness (confidence) of this result, and it does not 
manage precisely the level of conflict between the different 
sets of data. 


C. Results based on the SET approach 


The figure 10 shows the map of storm risk levels based on 
weighted averaging fusion rule used in SET-step 2, whereas 
the figure 11 shows the result when the PCR6 fusion rule’ is 
used in SET-step 2. These two resulting maps of risk levels 
can be interpreted as the SET-combination of maps shown in 
figures taking into account the importance and contradiction 
of the five meteorological criteria. 


Risk levels - SET method (weighted averaging) - 5 criteria 
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Figure 10: Storm risk levels based on SET (averaging rule). 


Risk levels - SET method (with PCR6) - 5 criteria 
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Figure 11: Storm risk levels based on SET (PCR6 rule). 


The confidence in the resulting storm risk maps of figures 
10 and 11 are shown in figures 12 and 13 respectively. 
D. Performances analysis 


To measure the performance of our method of storm risk 
assessment we need to compare our SET results with some 
ground truth. For this, we consider as ground truth the 


7with importance discounting of BBAs, as explained in [13]. 
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Figure 12: Confidence in decision (SET with averaging rule). 
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Figure 13: Confidence in decision (SET with PCR6 rule). 


information of location of strokes supplied by the World 
Wide Lightning Location Network (WWLLN) [4]. WWLLN 
archival data are copyrighted by the University of Washington 
and are available to the public at nominal cost. For a given 
date, at a time T, all cells where strokes impacts have been 
detected by WWLLN network, in the time interval [T +/- 
1h30], have been tagged. These data consist of the Nz = 223 
locations of all the detected lightnings on May 9th, 2016 in 
the time interval [1h30 - 4h30 UTC] which are shown as red 
dots in Fig 14. 


WWLLN Lightning Locations 
rag a : 


Figure 14: WWLLN lightning detections. 


The performance of the method are evaluated by the esti- 
mation of the detection probability Py = P(C' > Ci|dw = 1) 
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of lightnings, and the false alarm probability Pra = P(C > 
C1|dy = 0), where C' denotes the decision (i.e. the category 
of the risk) taken for a chosen confidence threshold, dw = 1 
indicates that a lightning flash has been detected by the 
WWLLN for the cell under analysis, and dw = 0 indicates 
no detection. These probabilities are empirically estimated by 


C >C,dw =1) 


P, = P(C > Ci|dw = 1)» ™ apes) @ 
. . C>Ci,dw =0 
Pra = POC > Cildyy = 0) x MAE 


where n(C > Ci, dw = 1) is the number of cells for which 
the joint event Ors C; and dy = 1 has occurred, and 
n(dw = 1) is the number of cells for which one has got 
a WWLLN detection dw = 1. Similarly, n(C > Ci,dw = 0) 
is the number of cells where events C > C; and dw = 0 
have occurred, and n(dyw = 0) is the number of cells having 
no WWLLN detection (i.e. dy = 0). 

The tables III indicates the estimations of the detection 
probability P, of lightnings, and of the false alarm probability 
Pra obtained by the three methods tested based on WWLLN 
set of detections. 


Methods Pa Pra 

Weighted Averaging Rule | 0.9775 | 0.3601 
SET with averaging rule 0.9462 | 0.2945 
SET with PCR6 rule 0.9507 | 0.2954 


Table III: P; and Pra performances. 


Our results show that SET approach (with averaging rule, or 
with PCR6 rule) provides interesting results because it allows 
to identify and predict the areas with high risk of storm that are 
coherent with the real location of lightnings detected by the 
WWLLN. One sees that the direct weighted averaging fusion 
of the risk maps of the five criteria shown in Fig. 9 produces 
notably more false alarms than with SET method, and only 
a little increase detection probability. In this work there is 
no clear advantage of using the PCR6 rule with respect the 
weighted averaging rule in step 2 of SET method because 
the performances in term of probability of detection and false 
alarms are very close. In terms of computational time, the 
direct weighted averaging fusion of the risk maps is the fastest 
method which takes few seconds® with MatLab (R2108a 
version) running with a MacBookPro laptop computer (2.8 
GHz Intel Core i7), then the second fastest method is the 
SET method using weighted averaging rule taking 1mn12sec, 
and the slowest (and most complicate) method is the SET 
method based on PCR6 rule which takes approximately 26mn 
to produce the results. One important avantage of the SET 
method (aside its aforementioned performances) is its ability 
to provide the confidence map of the solutions obtained by 
SET (i.e. the predicted risk levels) as shown in Fig. 12 and 
13. These confidence maps are useful to identify areas of risks 


8Once the five maps of risks have been computed. 


where the confidences are low and thus very uncertain if some 
important decision but be taken based on these solutions (for 
instance the diverting of the flight of an aircraft, etc). Such 
type of useful confidence map cannot be drawn form the direct 
weighted averaging fusion of the risk maps. Due to space 
restraint, we did not include the decision inconsistency maps? 
(i.e. the map of the probabilities) P(a; — 0) reflecting the 
impossibility to make a coherent outranking (see SET step 4), 
but these maps obtained by SET (with weighted averaging or 
with PCR6 rule) reveal actually only very few cells located 
mainly at the west of Panama. This very small number of 
cells yielding decision inconsistency indicates that SET has 
provided solutions in good decision-making conditions in 
general. 


IV. CONCLUSIONS 


In this paper we have presented an application of belief 
functions for storm prediction based on multi-criteria analysis 
and the Soft Electre Tri (SET) methodology. We have shown 
that SET allows to reduce notably the false alarms rate with 
respect to a simple weighted averaging fusion method without 
sacrificing much the detection of lightnings, and to provide the 
confidence map of the solutions obtained. The SET method 
based on PCR6 rule of combination performs well but it 
has a high computational burden which prevent it to use it 
for quasi-real time applications, and for working with multi- 
criteria problems involving many more criteria. This work, we 
hope, could serve as a benchmark problem for testing many 
MCDM methods in future. More investigations are currently 
done to apply this type of new methodology using more 
meteorological criteria on other type of real data sets, and more 
refined parameter settings that will be reported if possible in 
a forthcoming publication. 
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APPENDIX 
Basic definitions of belief functions 


In this appendix we provide basics of belief functions (BF) 
introduced by Shafer [2] to model epistemic uncertainty to 
reason about uncertainty. We assume that the answer of the 
problem under concern belongs to a known finite discrete 
frame of discernement (FoD) 0 = {6),62,...,0,}, with 
m > 1, and where all elements of © are exhaustive and 
exclusive. The set of all subsets of © (including empty set 0, 
and @) is the power-set of © denoted by 2°. The number of 
elements (i.e. the cardinality) of 2® is 2/©!. A (normal) basic 


°They have been included with the data set files in [22] for convenience. 
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belief assignment (BBA) associated with a given source of 
evidence is a mapping m(-) : 2° — [0, 1] satisfying m(0) = 0 
and >) 4¢2e m(A) = 1. The number m(A) is called the mass 
of A committed by the source of evidence. The subset A € 2° 
is called a focal element (FE) of the BBA m/(-) if and only if 
m(A) > 0. The set of all the focal elements of the BBA m(-) 
is noted by Fo(m) = {X € 2°|m(X) > 0}. The belief of A 
denoted Bel(A) and the plausibility of A denoted P(A) are 
usually interpreted respectively as lower and upper bounds of 
an unknown (subjective) probability measure P(A). They are 
respectively defined for any A € 2° from the BBA m/(-) by 


Bel(A)= So m(X) (10) 
X€2°|XCA 
and 
PUA)= SY) m(X)=1-Bel(A). (11) 


XE2°|ANX AO 


where A represents the complement of A in @, that is A 4 @— 
{A} ={X|X € © and X ¢ A}. The symbol = means equal 
by definition, and the minus symbol denotes the set difference 
operator. The vacuous BBA (VBBA for short) representing a 
totally ignorant source is defined as m,(Q) = 1. 


PCR6 rule of combination 


The PCR6 rule proposed in [10], [11] is an interesting 
alternative of original PCR rule of combination no. 5 (PCRS) 
proposed in [9], [19]. PCR6 and PCRS rules coincide if we 
combine only two BBAs defined on the same FoD. The PCR6 
fusion of S > 2 BBAs is obtained by m{S*° 5 (0) = 0, and 


for all A € 2° \ {0} by 


jE{1,..., F}|AEX, Am; (0) 


[( ia mi(X3,)) 


i€{1,...,9}|Xj,=A 


75 (0) 


(12) 
mi(X;,)) | 
X€EX; i€{1,...,S}|Xj,=X 


where me g(A) = Tayertm yams) er mi(Xj,) is 
X7,9...0Xjg=A 

the conjunctive fusion rule, and where 1;(Xj, 9 Xj,9...9 
= is mi(X;,), and 7;(Q) in (12) is the concise 
notation of 7; (Xj, .Xj,9...0X5,) when X;,X5,9...9 
Xj, =. The Ay is the logical conjunction operator meaning 
that conditions x and y must be satisfied. PCR6 rule is quasi- 
associative and it offers a more precise conflict redistribution 
than DS rule but it requires a higher computational burden. 
PCR6 does not preserve the neutrality of the vacuous BBA 
however. PCR6 is simpler to implement than PCRS. Very 
basic Matlab™ codes of PCR5 and PCR6 rules can be found 
in [13], [20], and also from the BFAS (Belief Functions and 


Applications Society) repository [21]. 
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Abstract—Data association has become pertinent task to in- 
terpret the perceived environment for mobile robots such as 
autonomous vehicles. It consists in assigning the sensor detections 
to the known objects in order to update the obstacles map 
surrounding the vehicle. Dezert-Smarandache Theory (DSmT) 
provides a mathematical framework for reasoning with imperfect 
data like sensor’s detections. In DSmT, data are quantified by 
belief functions and combined by the Proportional Conflict Redis- 
tribution (PCR6) rule in order to obtain the fusion of evidences 
to make a decision. However, this combination rule has an 
exponential complexity and that is why DSmT is rarely used for 
real-time applications. This paper proposes a new evidential data 
association based on DSmT techniques. The proposed approach 
focuses on the significant pieces of information when combining 
and removes unreliable and useless information. Consequently, 
the complexity is reduced without degrading substantially the 
decision-making. The paper proposes also a new simple decision- 
making algorithm based on a global optimization procedure. Ex- 
perimental results obtained on a well-known KITTI dataset show 
that this new approach reduces significantly the computation time 
while preserving the association accuracy. Consequently, the new 
proposed approach makes DSmT framework applicable for real- 
time applications for autonomous vehicle perception. 


Keywords: Data Association, Belief Functions, Dezert- 
Smarandache Theory, Proportional Conflict Redistribution 6, 
Dezert-Smarandache Probability. 


I. INTRODUCTION 


Multi-Target Tracking (MTT) is a fundamental system to 
interpret the perceived environment of mobile robots such 
as autonomous vehicles [1], [2]. These cars require precise 
knowledge of their surrounding environment in order to ensure 
safe and comfortable driving [3]-[5]. The MTT system esti- 
mates the status of detected objects surrounding the vehicle at 
different times by single or multiple sensors. Data Association 
is a central problem in MTT which assigns targets to the 
predicted tracks in order to update their status. Targets refer 
to the detected objects at the current time and tracks refer 
to the known objects in the scene. A dynamic environment, 
like the road environment, makes the object association more 
difficult because of the appearance/disappearance of objects in 
the perceived scene. 

Usually, the assignment problem is resolved by the proba- 
bility theory. Several methods have been proposed as the well- 
known Global Nearest Neighbour (GNN) method and the Joint 
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Probability Data Association Filter (JPDAF) [6]-[8]. GNN 
provides the optimal pairing by minimizing the global distance 
between detections and known objects. JPDAF is based on a 
weighted linear combination of all detections to estimate status 
of known objects. More details about these methods can be 
found in [8]-[10]. 

Recently, the belief function theory has also been used to 
cope with the association problem [4], [11]. This theory, also 
called Dempster-Shafer Theory (DST) [12], [13] allows to 
reason about uncertainty thanks to the belief functions that are 
often interpreted as lower and upper bound of unknown prob- 
ability measures. In fact, sensor’s detections can be inaccurate 
and incomplete. However, the DST models these imperfect 
information through a distribution of belief masses which 
quantify the confidence granted. Thereafter, these masses are 
combined by Dempster’s rule to make decisions. Because 
Dempster’s rule has been used and promoted by Shafer in 
his mathematical theory of evidence, it is also often denoted 
as DS rule in the literature. 

Rombaut in [14] formalizes the association problem by DST 
to reconstruct the environment of intelligent vehicles. This ap- 
proach measures the confidence of the association hypotheses 
between perceived and known obstacles by combining belief 
masses using DS rule. This approach is extended in [11], [15] 
to track vehicles where the association process is based on 
the Transferable Belief Model (TBM) [16]. This latter is a 
subjective and non-probabilistic interpretation of the Belief 
theory. In TBM, the decision-making is based on the pig- 
nistic probabilities derived from the belief quantities. Several 
alternative probabilistic transformations have been proposed 
in the literature. Our previous work [17] evaluates some of 
them on real-data in the context of the DST framework. 
In [11], the decision is performed by maximizing the joint 
pignistic probability. However, this probability is computed 
for all possible associations which grows the computation time 
exponentially with the objects number. To tackle this problem, 
the decision is made by selecting associations corresponding 
to local maxima of pignistic probabilities [4], [18]. More 
recently, Denceux et al. [19] express DS rule in terms of 
contour functions and plausibility functions which reduces the 
complexity and makes this approach applicable for real-time 
applications. 
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All those aforementioned approaches use Dempster’s rule 
which provides a counter-intuitive behavior specially in high 
and low conflicting situations [20], [21]. In fact, DS rule 
redistributes the conflicting mass on all elements which can 
cause the lost of the information specificity and then generates 
unacceptable results. In addition, serious mistakes have been 
shown in logical fundamentals of the DST framework [22]- 
[24]. To overcome those drawbacks, a more sophisticate rule 
has been proposed and defined in the framework of Dezert- 
Smarandache Theory (DSmT) [21]. Based on the Proportional 
Conflict Redistribution (PCR) process, PCR6 rule preserves 
the information specificity by transferring the conflicting mass 
only to the elements involved in the conflict and proportionally 
to their individual masses. However, PCR6 has an exponential 
complexity and that is why it is rarely used for real-time 
applications. 

In this paper, we propose a new evidential data association 
based on the DSmT framework. The first contribution is 
to reduce the complexity of the combination step based on 
PCR6 rule developed originally in the framework of Dezert- 
Smarandache Theory. The proposed approach focuses on 
the significant pieces of information when combining and 
removes unreliable and useless information. Consequently, 
the complexity is reduced without degrading substantially 
the decision-making. The second contribution is to propose 
a new simple decision-making algorithm based on a global 
optimization. Experimental results obtained on a well-known 
intelligent transportation systems dataset show the benefits of 
this new approach in terms of computation time reduction and 
association accuracy. 

The rest of this paper is organized as follows. In section II, 
few basics of the DSmT are presented. SectionIII details the 
new proposed evidential data association approach and its 
experimental validation is presented in Section IV. Finally, 
Section V concludes this paper. 


Il. FUNDAMENTALS OF DSMT 


In the Belief theory context, a problem is modelled by 
a finite set of hypotheses H; likely to be the solutions, 
called Frame of Discernment (FoD). In the general DSmT 
framework, the elements of the FoD do not need to be mutually 
exhaustive as in the DST framework, but in the particular 
context of our application presented in this paper, we work 
with Shafer’s model of the FoD where all elements of the 
FoD are mutually exclusive and exhaustive, that is: 


6 =U, {Hi} with H,9 Hj =0 (1) 


where H; are denoted as singletons, the lowest piece of 
discernible knowledge in the FoD. 


A. Basic Belief Assignment 


A basic belief assignment (bba) or mass function associated 
to a given source is defined as a function m : 2° > [0,1] 


satisfying: = m(A) 
m =1 
(2) 


Figure 1. Illustration of the refinement function p [11]. 


where ™m(A) is the mass of belief that supports A. The source 
is totally ignorant if m(O) = 1 and so the bba is considered 
as vacuous function. Whether m(A) > 0, A is called a focal 
element of the bba m(.). Thus F(m) = {A € 2°/m(A) > 0} 
defines the set of focal elements. 


B. Vacuous Extension 


Some sources of information can express on different FoDs 
but related. However, in order to combine them, it is necessary 
to work with the same common frame. For that, it can be 
defined a finer FoD [13]. Let Q a finer frame of O where 
every element of © is mapped into one or more elements of 
Q (Cf. Fig. 1). Therefore, the refinement function p matches 
proposition A from 2° to 2° according to: 


{ {p({O}),@ € O} is a partition of 2 (3) 
VA CO, p(A) = Use a P({9})- 


The vacuous extension m©t® defines the bba on Q from 
the bba m® defined on © and the refinement p: 


) 
me (oA) = { 0, otherwise. 


C. Belief Combination 


The belief combination consists in merging the measures 
of evidence m® of M distinct sources S;, defined on the 
same frame ©, to a new distribution of evidence. For that, 
the Proportional Conflict Redistribution rule 6 (PCR6) have 
been proposed in [25] and theoretically justified in [21]. In 
fact, PCR6 rule overcomes the drawbacks of the Dempster 
rule [13] by redistributing proportionally the partial conflict 
only on elements involved in this conflict. The formula of 
PCR6 is defined by mpcr6 (0) = 0 and VA € 2°\{0} by [26], 
[27]: 


(4) 


mepcro(A) = Mconj(A) 


+ 
JE{1,...,F}| ACA; Ar; (0) 


i€{1,...,.M}|Aj,=A 
(5) 
AEA; i€{1,...,.M}Aj,= 

where / is the logical conjunction! and Aj is a pos- 
sible M-uple of focal elements with A;, € F(m®), that 


‘ie. a Ay means that conditions 2 and y are both true. 
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is A; = (Aj,,Aj.,-++ ,Ajq,)/ F is the cardinality of 
Fone m§,...,m9,) which is the set of all possible M-uple. 


Mi PB 
ag) Any = I, m? (Aj, 1s and 


nN oe defines the conflicting 
.Aj,, = @ and the 


And where 7;(A;, 9 Aj, - 


75 (0) = 7y(Aj, A Ap O° 
mass product of A; if Aj, Aj, 9. 
conjunctive rule mconj is given by: 


ds 


Aj, O...NA; 


JM 


™MConj(A) = 


M 
[[?(4;). © 


ju=A i=l 


D. Probabilistic Transformation 


Decision-making consists of selecting a solution among 
all possible hypotheses. Usually, the decision must be made 
among elements of the frame. However, the belief combination 
also generates masses for disjunctive propositions. Therefore, 
it is necessary to redistribute the masses of these unions 
on elements of © in order to make a decision. For that, 
Dezert-Smarandache Probability (DSmP) transformation is 
defined [28] where DSmP(0) = 0 and VA € 2°\{0}: 


SS m(Z)+eC(ANY) 


ZCANY 
DSmP(A) = \) —>—_.——_ (9) 
Ye2® » m(Z)+eC(Y) 
C(Z)=1 


Where € > O is used to adjust the effect of element’s 
cardinality (C(.)) in the proportional redistribution. In addition, 
€ permits to compute D.SmP when encountering zero masses. 
Typically, « = 0.001 because with a smaller € the Probabilistic 
Information Content (PIC) [29] is higher. The PIC indicates 
the level of the available knowledge to make a correct decision. 
PIC = 0 indicates that no knowledge exists to make a correct 
decision. 


III. DATA ASSOCIATION USING DSMT 


Four steps are needed to solve the data association problem: 
modeling, estimation, combining, and decision-making. How- 
ever, PCR6 rule combination has an exponential complexity 
which makes it not appealing for real-time applications. This 
is why in this paper, only k-significant sources are combined 
(with & lesser than the original number or sources available). 
Thereafter, a simple global optimization is used to make 
association decisions. 


A. Data Modelling 


Let us consider n detected objects at time ¢ and m known 
objects at previous time t — 1. In this context, data association 
aims at matching the n detected objects X,; to the m known 
ones Y; under certain conditions: 


e multiple associations are not accepted, a detected object 
is associated with only one known object at most and 
vice versa, 

e multiple new objects can appear, 

e multiple known objects can disappear. 


The distances between the attributes of objects (position, 
velocity, etc.) are considered as pieces of evidence. For a given 
distance, its belief will be expressed on the elementary FoD 
9:5 = {yeS(;,;),N0(:,;)} which models the relevance of the 
association between X; and Y;. Therefore, three bba masses 
are constructed for each pairwise objects (X;, Y;): 
o mi. (yes( 3) : 
with Y;, 

« mi (noc,;)) : degree of belief that X; is not associated 
with Y;, 

o m5 (65,3) 


B. Belief Estimation 


degree of belief that X; is associated 


: represents the ignorance. 


The estimation of belief masses is related to the considered 
application. The most suitable model for data association 
applications [30] is the non-antagonist model [14], [15] defined 


_J 0 Fig € [0,7] 
™; (You3)) { ®1(i,;) Li; E [r, 1] (8) 
== Oo (Ii, ) Li, e (0, 7] 
me Fn) = { 0. 4ky€lnil @) 
9i,. 7-0 
1—m,;" (Yaa) 2 ica 


where J;,; € [0, 1] is an index of similarity between X; and Yj. 
®,(.) and 2(.) are two cosine functions defined as follows: 


$1 (1i,5) = 
$2(1i,5) = 


where 0 < a < 1 is the reliability factor of the data source 
and 0 < tT < 1 represents the impartiality of the association 
process. 


(77 edt 
1— ore . | a) 
1+ cos(42)] 


nN] wd 


C. k-Significant sources combination 


Before decision-making, sources should be combined which 
is possible only if they express on the same FoD. Hence, to 
determine who is associated to the detected object X;, a new 
FoD is defined ©; (12). This new frame is composed of 
the m possible X;-to-Y; associations denoted Y(;,;) and the 
appearance hypothesis of object X; denoted by Y(;,.): 


O;,. = {¥u,1); Y(i,2); mney Yum); Yaa} : 


Therefore, ©; is a refinement frame of the previous FoDs 
6;,; in which the belief is initially expressed (Cf. Fig. 2). Based 


(12) 


on a vacuous extension (3), initial belief functions m%/ are 
expressed on ©;,. as follows: 
0i,. 
mes (Yu,9)) =m #4 (yes; 4) 
mp (Keegy) = me (noG5) (13) 
m5" (@i,.) = ms (;,3) 
where Y(i,3) represents the hypothesis ”X; is not as- 


sociated to Y;” which corresponds to the union of all 
association hypotheses expect the Y(j,;), Le. Yuj) = 
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Figure 2. The refinement frames of 0;,;: ©;,.. 


{YG,1); carey Yo j—-1> Yag+y cee iM am)s ; Yeah It should be 
noted that no information is initially considered on Y(;,..). This 
information appears during combination step. 

Once the sources are expressed on the same frame, the 
bbas are combined with the PCR6 rule. However, combining 
all sources increases the time-consuming and can be reach 
an exponential complexity when the number of sources is 
important. To overcome this drawback, this paper proposes 
a new method to reduce the combination complexity without 
sacrificing too much the decision quality. 

The proposed approach selects only information having 
belief in top k highest masses. Formally, for each X; object, 
initial masses on association hypotheses are sorted: 


by > bg >... >b,>...> bm 
b, = m%s (yes, ;)), and z,j € {1,...,m} 


(14) 


where b, is highest mass of belief, so the source that generated 
it is the most significant for matching X;. On other hand, the 
least important source is that which generates the lowest belief 
Dm. 

Now, only & most significant sources are selected for 
their combination. Therefore, for each X; assignment, ©;,, 
is defined as follows: 


62, =4 Vi anlhe = bs Kaew} 


with z € {1,...,m} and k < m. Consequently, ©;,. 
contains only the most relevant hypotheses and ignores others 
(bz < by). By this simple selection procedure one reduces the 
computation complexity of the combination process. 

If by = 0, bg—1 is used to select significant sources. In 
the case where no by > 0, the object X; is considered as 
an appearance and is associated directly to Y(;,,.). Thereafter, 
initial mass functions m®%i(.) is hence transferred to ©;_, 
by the refinement defined in (13) and the PCR6 rule of 
combination (5) is applied. 


(15) 


D. Decision-Making 


The assignment decision is based on the DSmP;,. matrix 
which is the probabilistic approximation of the combined 
masses. Table I presents the DSmP;,, of the detected-to- 
known objects association. Each line defines the association 
probabilities of the detected object X; with all known ones 
Y;. DSmF;,.(Y(i,«)) defines the appearance probability of X;. 
It is useful to note that multiple objects can appear/disappear. 

Different decision-making strategies have been proposed 
according to the desired objectives [11], [18]. There are two 


Figure 3. Scenario showing 5 detected objects (triangle) and 4 known objects 
(circle). 


approaches depending on the type of optimization: global 
or local. The first approach selects the “best” associations 
optimizing a global cost function [31], [32]. The Joint Pignistic 
Probability (JPP) Bet Py : is defined as the cost function 
in [11]: 


BetPyn, = BetPy,.(Ya,j)) X --X BetPn,.(Ynj,)) 10) 


with 7; € {1,2,...,m,*}. Among all possible solutions for 
the detected-to-known association, the best is that maximizing 
BetPyn_,- However, when the number of possible associa- 
tions is important, this optimization generates a high compu- 
tational complexity. To cope with this inconvenience, another 
approach consists of resolving the assignment problem by a 
local optimization. The Local Pignistic Probability (LPP) [18] 
makes the association decisions according to local maxima 
of the pignistic matrix (BetP;,.). The LPP method performs 
a successive selection of n local maxima while respecting 
the association constrains (Cf. Section III-A). However, local 
optimization is considered as a sub-optimal solution. 

In this paper, a new simple global optimization is ap- 
plied/proposed. Firstly, the last column (Y(;,,.)) of the DSmP 
matrix is removed in order to select ’best” associations by 
using the well-known Munkres algorithm [33]. The com- 
plexity of this algorithm is only O(n?) [33]. Secondly, 
for each selected association Y(;,;), if DSmP;,(YGj)) < 
DSmFi,.(Y(i,«)) the association Y(;,;) is removed and the 
object X; is considered to be a new object (Y(;,x)). 


IV. ILLUSTRATIVE EXAMPLE 


Let us consider the simulated example presented in Fig. 3. 
The scenario shows 5 detected objects and 4 known objects. 
By observing the corresponding initial bba presented in Ta- 
ble I, one can already assume some associations. For instance, 
with m®- (yes(3,1)) = 0.85 and m9. (yes(2,4)) = 0.75, X3 
and X»2 are most likely to be associated respectively to Y; 
and Y4. As for the detected object X5, no source supports its 
association with a known objects, so it can be an appearance. 

Therefore, it is possible to make decisions by combining 
only some information? To answer, the proposed method is 
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Table I 
DSmP PROBABILITIES OF DETECTED-TO-KNOWN OBJECT ASSOCIATIONS. 


, DSmP1,.(Ya, DSmP1, (Yam) DSmP1, (% 
DSmP»,.(.) DSmP>,.(Y21)) DSmP>,.(Y2m))  DSmP>,.(Y; 
DSmPn,(.) DSmPn, (Yin.1)) DSmPn,(Yinm)) __DSmPn, (Y, 


1,*) 


Table II 
INITIAL MASS FUNCTIONS FOR THE SCENARIO IN FIG. 3. 


mot (yes(1,1))) = 0.45 m1,2 (yes(1,2)) = 0.48 m3 (yes(1,3)) = 0.00 moa (yes(1,4)) = 0.00 
Sia m*1,1 (no(1,1)) = 0.35 Si2 m*1,2 (no(1,2)) = 0.32 S13 m1,3 (no(1,3)) = 0.95 Si4 mo14 (no(1,4)) = 0.99 
m®1.1 (04,1) = 0.20 m®1.2 (01,2) = 0.20 m®1:3 (01,3) = 0.05 m*1.4 (61,4) = 0.01 
m?2,1 (yes(2,1)) = 0.00 m®2.2 (yes(,9)) = 0.32 m?2,3 (yes 2,3)) = 0.47 m?2,4 yes2,4)) = 0.75 
So 4 m2.1 (no(2,1)) = 0.99 S2,2 m°2.2(no(2,2)) = 0.58 So,3 4 m%.3 (no(2,3)) = 0.43 Soa ¢ m4 no(2,4)) = 0.15 
m2.1 (82,1) = 0.01 m®2.2 (82,2) = 0.10 m2.3 (82,3) = 0.10 m®2.4 (82,4) = 0.10 
m3.1 (yes(3,1)) = 0.85 m°3.2 (yes(3,2)) = 0.00 m3.3 (yes(3,3)) = 0.00 m3.4 yes(3,4)) = 0.00 
93,1 4 m%3.1 (novg,1)) = 0.05 S32 m®3.2 (nov3,2)) = 0.90 53,3 ¢ m%.3 (nov3,3)) = 0.90 53,4 4 m%.4 n03,4)) = 0.99 
m®3.1 (03.1) = 0.10 m®3.2 (83.2) = 0.10 m°3,3 (03 3) = 0.10 m®3.4 (03 4) = 0.01 
most (yes(4,1)) = 0.00 m4.2 (yes(4o)) = 0.00 mo4,3 (yes(4,3)) = 0.50 m4.4 yes(4,4)) = 0.00 
Sai m1 (nova,1)) = 0.99 S4,2 m4.2 (nova,2)) = 0.90 Sag 4 mo4.3 (no(4,3)) = 0.40 Saa 4 m%4.4 nov4,a)) = 0.99 
m®4.1 (64,1) = 0.01 m4.2 (04,2) = 0.10 m®4.3 (84,3) = 0.10 m®4.4 (04,4) = 0.01 
m.1 (yescs,1)) = 0.00 m®S.2 (yes(5,9)) = 0.00 m95,3 (yes5,3)) = 0.00 mo5.4 yes(5,4)) = 0.00 
S5,1 m.1 (novs,1)) = 0.90 S5,2 m.2 (novs,2)) = 0.85 55,3 4 m%5.3 (nov5,3)) = 0.90 S5,4 4 95.4 n0(5,4)) = 0.90 
m®5.1 (65,1) = 0.10 m5.2 (05.2) = 0.15 m®5.3 (05 3) = 0.10 m°5.4 (85,4) = 0.10 


applied with k = 2. The selected information for the detected- 
to-known association are represented by (17): 


= ere 
O2,. = 1 ¥(2,3); Y(2,4); Y(2,*) 
03. = ¥(3,1)s ¥(3,*) 

= {Y%43) Yas 
direct decision: X5 appears. 


(17) 


Regarding the association of Xj, the two highest be- 
lief masses (0.48 and 0.45) are respectively related to 
the Y(1,.2) and Y(1,1) hypotheses which makes them rele- 
vant for decision-making. Thus, we work with the frame 
OQ... = {Y¥(a1); ¥a,2), Ya, } instead the set of all hy- 
potheses {¥(1,1), ¥(1,2), Y(1,3)s Y(1,4), Y(a,#)} which decreases 
the complexity of combination. In the same way, 02, = 
{¥(2,3); Y(2,4); Y(2,") } because the highest beliefs (0.75 and 
0.47) are related to the Y(2.4) and Y(2,3) hypotheses. In this 
case, Y(21) and Y(2.2) are ignored because their beliefs are 
less significant than those of Y(2\3) and Y(2,4) (0.75 > 0.47 > 
0.32 > 0.00). For X3 and X4, there is only one piece 
of information with a non-null belief for their association. 
Therefore, 03, = {¥(3,1), ¥3,+)} and O4,. = {¥(4,3), Yia,«)}- 
Concerning X5, no source believes on its association, so X5 
is a new detected object which means an appearance Y(5,.). In 
this case, the decision is directly made without combination. 
Consequently, the cardinality of each ©;, (17) is reduced 
which means less computation time when combining. 


To make decision, the selected information are combined 
by (5) and transformed to DSmP probabilities by (7). Ta- 
ble III represents DSmFP;,.(.) based on the two most signifi- 
cant mass functions. The dimension of each DSmFP;,. vector 
is smaller than usual (Cf. Table IV) and corresponds to the 
number of relevant associations. In this context, the complexity 
of decision-making can be reduced too. In addition, it can be 
observed that the proposed approach preserves the relevant 
association probabilities. Therefore, the same decisions (18) 
are made through Tables III and IV. 


Table III 
BetP;,. BASED ON 2-SIGNIFICANT MASS FUNCTIONS. 
O;.. Ya Yoi2) Yaa) Yaa) Yas 
DSmP,,. 0.40 0.45 - - 0.15 
DSmP2,. - - 0.27 0.66 0.07 
DSmP3 0.94 - - - 0.06 
DSmP4 - - 0.56 - 0.44 


Table IV 
BetP;,, BASED ON ALL MASS FUNCTIONS. 


DSmP2,, 0.00 0.11 0.22 0.61 0.06 
DSmP3,, 0.95 0.00 0.00 0.00 0.05 
DSmP4,. 0.00 0.00 0.56 0.00 0.44 
DSmPs,. 0.00 0.00 0.00 0.00 1.00 


459 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Table V 
KITTI IMAGE SEQUENCE CHARACTERISTICS. 


Seq. 2  Seq.4 Seq.6 Seq. 7 Seq.8 Seq. 13 Seq. 14 Seq. 16 Seg. 18 Seq. 19 Seq. 20 
Number of frames 233 314 270 800 390 340 106 209 339 1059 837 
Number of associations 668 545 474 2083 492 617 744 1872 1130 4968 4673 
Max vehicle speed (km/h) 43 56 33 34 62 26 35 0 55 21 54 
Min vehicle speed (km/h) 0 20 0 1 38 8 1 0 0 0 0 
Speed < 30 km/h (%) 66 15 93 ve) 0 100 87 100 66 100 51 
Speed > 30 km/h (%) 34 85 7 25 100 0 13 0 34 0 49 
Table VI 
Xi Yo COMPUTATION TIME (ms) OF THE COMBINATION STEP FOR 24 FRAMES 
XY CONTAINING (n,m) OBJECTS. 
X34 VY (18) (n,m) All Sources 4—Sig. Src. | 3—Sig. Src. | 2—Sig. Src. 
X47 ¥3 (4, 4) 1.33 1.49 0.60 0.39 
X5 appears. (7, 7) > 0.1s 2.27 0.92 0.59 
(10, 10) > 5s 3.54 1.35 0.89 
V. EXPERIMENTAL RESULTS (13, 13) =~ 4min 5.28 2.20 1.33 
This section evaluates the proposed approach on real data 
coming from the well-known KITTI dataset [34]. First, the “4 F : . 
g me 34] ; where d.'°!* (d,"'9"*) is the Euclidean distance between top- 
dataset description is presented, followed by the experimental t td 


setting. Secondly, the obtained results are analyzed and com- 
mented. It is noted that this evaluation focuses only on data 
association, so no tracking is done. 


A. Datasets 


The KITTI vision dataset provides data recorded from 
different sensors mounted on a moving vehicle on urban 
roads [34]. It contains camera images, laser scans, and 
GPS/IMU data. The dataset also includes object labels classi- 
fied in 8 categories. For this evaluation, only image data have 
been used where detections are defined by 2D bounding box 
tracklets. Four object classes have been considered: pedestrian, 
cyclist, car, and van. Table V presents a part of these sequences 
according to their different road context and the number of 
detections. On some sequences, the vehicle mainly moving at a 
speed less than 30 km/h which is common in urban areas, e.g. 
sequences 6, 13, 14, and 19. Sequence 16 was recorded when 
the vehicle stopped at a crosswalk, ie. speed = 0 km/h. On 
other sequences, the vehicle was moving at a speed sometimes 
exceeding 50 km/h, e.g. sequences 4 and 8. Fig. 4 illustrates 
the number of objects per image and their proportion on 
each of the sequences where more than 30000 associations 
have been evaluated. To the best of our knowledge, no study 
has been evaluated on so many real data. These latter cover 
different road scenarii containing various objects as shown in 
Fig. 5. 


B. Experimental Setting 


The matching process is based on the distance between 
objects attributes. In this work, only 2D position in the image 
plane is considered as pieces of evidence. Thus, the distance 
d;,; is defined as follows: 


dg = 05% (Go? pa) (19) 


left (bottom-right) points of the bounding boxes of objects X; 
and Y; as illustrated in Fig. 6. 

The critical parameters to estimate belief masses are: a = 
0.9, 7 = 0.5 and « = 0.001 for DS'mP transformation. The 
proposed approach is written in C++ and runs on Intel core 
i7 2.20 GHz with 8 GB RAM. 


C. Results and Analysis 


The performance of the k-significant sources combination 
refers to its capacity to reduce complexity while maintaining 
a high decision quality. Therefore, the evaluation focuses on 
the Computation Time (CT) and the recall which are defined 
as follows: 


CT =¥°, ET, 
i a TAs (20) 
recall = or, 


where ET; is the execution time of the frame t, TA; and 
GT, are the numbers of true associations and ground truth 
associations respectively. 

Table. VI compares the running time of the combination step 
using two approaches according to the number of objects. The 
first is to combine all the sources and the second combines the 
k-significant sources where k € [2,4]. To show the real-time 
aspect of the proposed approach, the association process is 
applied for 24 frames. The results confirm that the proposed 
approach needs low computation time than combining all 
sources. The smaller the number of combined sources, the 
shorter the computation time. With n = m = 13, the proposed 
approach (& = 2) needs 1.33ms on 24 frames while combin- 
ing all sources takes ~ 4 minutes which is not acceptable 
for real-time applications. In addition, combining all sources 
grows exponentially the computation cost with (n,m) while 
the time complexity of the proposed approach is polynomial 
which makes it well-suited for real-time applications (Cf. 
Fig. 7). 
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Seq. 20, 837 frames 
Seq. 19, 1059 frames 
Seq. 18, 339 frames 
Seq. 17, 145 frames 
Seq. 16, 209 frames 
Seq. 15, 376 frames 
Seq. 14, 106 frames 
Seq. 13, 340 frames 
Seq. 12, 75 frames 
Seq. 11, 373 frames 
Seq. 10, 294 frames 
Seq. 9, 802 frames 
Seq. 8, 390 frames 
Seq. 7, 800 frames 
Seq. 6, 270 frames 
Seq. 5, 297 frames 
Seq. 4, 314 frames 
Seq. 3, 144 frames 
Seq. 2, 233 frames 
Seq. 1, 447 frames 


Seq. 0, 154 frames 
0% 20% 40% 


Percentage of frames (%) 


Figure 4. The number of objects per frame vs. percentage of frames. 


Table VII 
COMPUTATION TIME (ms) OF THE DECISION-MAKING 
STEP FOR 24 FRAMES CONTAINING (n, m) OBJECTS. 


(n,m) JPP Our method Comp. time gain 
(2, 2) 0.21 0.16 23.91% 
(3, 3) 1.2 0.16 86.66% 
(4, 4) 9 0.21 97.66% 
(5, 5) 104 0.27 99.74% 
(6, 6) > 9s 0.33 99.99% 
(7,7) > 46min 0.90 99.99% 


Table. VII compares the complexity of the proposed 
decision-making algorithm with the JPP method according 
to the number of objects. Both of these methods are based 
on a global optimization. The results show that the proposed 
algorithm needs low computation time than JPP to make 
association decisions. With more than 4 perceived/detected 
objects, the complexity is reduced by more than 97%. For 
instance, with nm = m = 7, our proposed algorithm needs 
less than 1ms to assign perceived objects on 24 frames while 
JPP takes too large time, more than ~ 46 minutes. Fig. 8 
confirms that our algorithm is characterized by a polynomial 
complexity while JPP has a high exponential complexity which 
makes impossible its application on the KITTI sequences. For 
this reason, the rest of the results presented in this section are 
obtained by our simple decision-making algorithm. 

To measure the gain on complexity, the variation in the 
computation time of a system without (CT",,) and with the 
k-significant sources combination (C'T/) is computed for each 
sequence (7) (21). The higher gain, the better complexity 
reduction we get. In the same manner, the recall gain is 
computed (22). The higher Gain* the better decision- 


recall? 
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Number 
of objects 
quality we get. A higher Gain’,....1, preserves well the 
decision-quality. 
4 (CT*,, — CT’) 
Gain op = ——— 100. 21 
ain op CT, (21) 
call’, — Il 
Cite io, a 


i 
recall*,,, 


The weighted average of gain based on all sequences is 
given by: 


- aug = 20 : - 4 
Gain" = Vijizo WiGain' or 
(23) 
- aug = 20 . at 
Gain recall ~~ Pet wiGain recall 


where the weight w; is w; = n;/ oar n; and n; being the 
number of associations of the i-th sequence. 

Fig. 9 presents the weighted average of the computation 
time gain versus k. These results are obtained by varying the 
number of significant sources selected, i.e. k. For all dataset, 
more than 30000 associations, the gain exceeds 99.90% which 
is well-suited for real-time applications. This gain is explained 
by the fact that our approach has a polynomial complexity 
while combining all sources is characterized by an exponential 
complexity (Cf. Fig. 7). In addition, the obtained results show 
that the computation time reduction is inversely proportional 
to the k parameter as shown in Table. VI. Indeed, by reducing 
the number of significant sources, the combination complexity 
decreases which allows a more important gain. Although if the 
gain, which is expressed as a percentage, seems small between 
the different values of k € [2,7], it remains important for real- 
time constrain. 
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Figure 5. Examples of images provided by KITTI [30]. 


The gain depends also on the number of perceived ob- 
jects. In fact, contrary to our approach, combining all 
sources increases exponentially the computation time with 
perceived/detected objects (n,m). Therefore, the more objects 
in the scene, the greater the gain will be (Cf. Fig. 10). That 
is why for sequences 3,6, 8,10, and 12 where the number of 
detections is mostly less than 4, the gain is less than 40% 
while for other sequences is more than 80%. Therefore, the 
obtained results lead to conclude that the more complex is the 
sequence, the larger is the computation time reduction. 

Now, how about the decision quality? Combine just the sig- 
nificant sources, affects the decisions or not? Fig. 11 presents 
the weighted average of the recall gain versus k. it is clear 
that the gain is insignificant, —0.1% < Gain recat < 0.05%. 
This result proves that focusing only on significant information 
does not necessary affect the decision quality. Furthermore, 
the obtained results also show that ignoring the useless in- 
formation can improve slightly the quality of decisions. For 
instance, on sequences 11, 17, and 18 the association decisions 
are improved by more than 4% (Cf. Fig. 12). Therefore, the 


ME 


new object = current detection 


Figure 6. The illustration of the distance between a detected and a known 
object [30]. 


120 - 
=——}¢— Combine all sources 
——}¢— Combine the 2-significant sources 
100 F — 9\— Combine the 3-significant sources 
—$— Combine the 4-significant sources 


Computation time (ms) 


0: 
(2,2) (3,3) (4,4) (5,5) (6,6) (7,7) 
(n detected objects, m known objects) 


Figure 7. Computation time of the combination step as a function of the 
number of objects. 


solution proposed provides good performances by reducing 
significantly the computation time while preserving the asso- 
ciation decisions. 

The choice of parameter & depends on the application 
context and on the desired performances. For the object 
association in road environment and based on our tests, k = 3 
appears to be a good setting threshold parameter. 


VI. CONCLUSION 


This paper presented a new evidential data association based 
on significant sources combination and a simple decision- 
making algorithm. The main objective of the proposed ap- 
proach is to reduce the complexity and time consumption of 
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—— IPP 
—— Our proposition 


100 


Computation time (ms) 


(3,3) (4,4) 
(n detected objects, m known objects) 


(5,5) 


Figure 8. Computation time of the decision-making step as a function of the 
number of objects. 


99.985 


99.98 


Comp. time gain (%) 


99.975 


Figure 9. Computation time gain as a function of the parameter k. 


Comp. time gain (%) 
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Sequences 
Figure 10. Computation time gain of 3-sig. sources approach on each 

sequence. 
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Figure 11. Recall gain as a function of the parameter k new. 


Recall gain (%) 
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Figure 12. Recall gain of 3-sig. sources approach on each sequence. 


data fusion based on DSmT techniques (PCR6 and DSmP). 
This approach focuses only on information having belief in top 
k; highest masses and removes useless information. Therefore, 
only k-significant sources are combined to deal with the 
association problem. 


Applied to intelligent vehicles perception, the experimental 
results show the effectiveness of the proposed approach in 
the reduction of the complexity by more than 99% in dense 
scenes. Besides, experimental results show that the proposed 
solution preserves well the decision-quality. It can be noted 
that the k-significant sources combination is not intended only 
for road environment perception. It can be applied to any data 
association process based on these DSmT techniques. 


Future work should combine heterogeneous sensor data to 
enhance the object association. Also, we plan to evaluate if an 
improvement of PCR6 rule of combination would be helpful 
for the data association problems. 
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Abstract—Due to the lack of knowledge concerning their 
construction and their history (breaks and repairs, extensions...), 
fluvial levees are often badly characterized. Breaks of work 
are likely to lead to disastrous consequences such as loss of 
lives and economic disasters. In order to prevent the risk 
of breakage, special supervision of the protection levee is re- 
quired. Recognized methodologies for the assessment of hydraulic 
structures include complementary geotechnical and geophysical 
reconnaissance methods. This work presents a new way of math- 
ematically combining data from these two types of information 
sources, taking into account the specificities of each kind of 
method (level of imperfection associated with the data, spatial 
distribution of the information). This new methodology considers 
the framework fixed by the theory of belief masses and improves 
the characterization of lithological sets within levees. It provides 
information on the level of conflict between information sources 
while proposing a confidence index associated with the results. 
The methodology is implemented through a subsoil section 
characterized by a real earthen levee investigation campaign. 
This campaign involves electrical resistivity tomography as well 
as particle size distribution from laboratory testing and on-site 
cone penetrometer test. The results highlight the ability of this 
fusion methodology to characterize the considered materials as 
well as to specify the positions of the interfaces and the associated 
levels of confidence. 


Keywords: geophysics, geotechnics, protection levee, belief 
functions. 


I. INTRODUCTION 


Fluvial levees are elevated manmade structures, built up 
for flood protection, between channels and floodplains [1]. 
Unfortunately, some hydraulic earthworks cannot ensure their 
role in flood episode, and are likely to break, leading to 
catastrophic events (human, material or economic damages). 
There is therefore a real need to prevent the risks of rupture by 
characterizing these complex human structures. To answer this 
need, investigation campaigns are set up for subsoils character- 
ization and weak zone identification. These campaigns usually 
involve the use of geophysical and geotechnical methods [2] 
to make a diagnosis and assess the levee stability. 

Geophysical investigation methods are non-intrusive and 
provide physical information over a large volume of subsoil. 
This information, however, is potentially tainted with impor- 
tant uncertainties, notably due to the indirect and integrating 
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aspects of the methods as well as to the limited resolution of 
the inverse problems. In a complementary way, geotechnical 
methods are intrusive but provide spatially more punctual 
and more precise information, being directly in contact with 
the material to be identified. An important outcome for the 
characterization of the investigation campaigns is to be able 
to combine the information acquired by these two sets of 
methods, while taking advantage of their specificities and their 
respective uncertainties, inaccuracies and spatial distributions 
[3]. The complementarity of these methods is rarely used to 
its full potential and the results are often simply graphically 
superimposed instead of being mathematically merged [4]. 

To characterize a levee and its possible weak areas, it is 
necessary to distinguish the different geological materials in 
place. The positions of interfaces must also be located, as 
well as the presence of any anomalies (low-density zone, 
presence of pipe,...). It is at these interfaces or anomalies 
that the internal erosion is likely to be initiated, eventually 
leading to the rupture of the structure [5]. A characterization of 
these geological sets and interfaces with associated confidence 
indexes could be of great help if they were included in failure 
hazard models. 

In this work, we propose a novel methodology for the 
fusion of information based on the use of belief functions [6], 
[7]. We compare two different combination rules to merge 
geophysical and geotechnical data while taking into account 
the specificities of each method (spatial distribution, inaccu- 
racy, uncertainty and incompleteness). We use belief functions 
(BFs) theory since it does not require learning periods or 
having very large data sets as the use of artificial neural 
networks would require [8]. The theory of the BFs makes 
it possible to quantify the conflict between the sources of 
information and to quantify uncertainty where the probabilistic 
theory only considers equiprobabilities. Finally, the strong 
point of the BFs theory with respect to Zadeh’s theory of 
possibilities [9] is that it is possible to merge disjoint intervals. 
In the field of geosciences, some works use the BFs to 
provide results for slope instability [10], [11], ground water 
[12] or flood susceptibility mapping [13]. To our knowledge, 
no work has been published on the merging of geophysical 
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and geotechnical data for an investigation campaign of a river 
embankment levee. 

Our fusion methodology has already been tested and val- 
idated considering two sources of information, as part of 
numerical simulations [14], [15] and of a laboratory test 
bench experiment [16]. In this work, we present the results 
of our approach applied to data acquired by three sources of 
information, on a real levee of the Loire River, located in Saint- 
Clément-des-Levées (France). It is the first in-situ validation of 
this merging process. The geotechnical methods used are the 
cone penetration test (CPT) and the particle size distribution 
from laboratory testing, after on-site drilling. The considered 
geophysical method is the electrical resistivity tomography 
(ERT). The objective of this study is to highlight the ability 
of our methodology to distinguish the major geological sets 
constitutive of the levee by suggesting their distribution and 
proposing the location of the interfaces. The presented results 
are associated with confidence indexes. 

This article is organized as follows: in section II, we give a 
presentation of the Investigated site and of the three investiga- 
tion methods, both geophysical and geotechnical, used in this 
campaign. In section III, we describe the fusion methodology 
by first introducing the BFs concept and the considered 
combination rules. We then show how the geophysical and 
geotechnical data are respectively processed, and we propose 
two brief parametric studies on different belief mass allocation 
methods. We finally present the final fusion results in section 
IV and discuss them in section V to highlight the interests, 
drawbacks and perspectives of such a fusion methodology. 


II. INVESTIGATED LEVEE AND INVESTIGATION METHODS 
A. Saint-Clément-des-Levées fluvial levee 


The studied structure is a fluvial levee located in Saint- 
Clément-des-Levées near Saumur (France) along the River 
Loire in the Val d’ Authion area (Fig. 1-b). It is a clayous-sandy 
embankment with Turonian bedrock overlaid by alluvial mate- 
rials (Fig. 1-a). This levee has been the subject of many studies 
presenting both geophysical (ERT, radio-magnetotelluric, Slin- 
gram, ground penetrating radar) and geotechnical (laboratory 
tests on collected samples: particle size distribution, clay 
content, moisture content, density) investigation campaigns to 
detect possible water circulation within the structure [17], [18], 
[19]. The investigation campaigns were carried out during 
the day between June 26 and July 5, 2018. On this earthen 
hydraulic structure, we carried out a geophysical campaign 
using the ERT method and a geotechnical campaign with 
coring and CPT tests carried out on the levee crest. The 
positions of the tests and the electrode line are displayed in 
Fig. l-c. 


B. Electrical resistivity tomography (ERT) 


The basic principle of DC-resistivity methods consist in 
injecting an electric current of known intensity (A) by means 
of two “current” electrodes and measuring a voltage (V) 
between two “potential” electrodes. Such measurements are 
acquired for several stations (positions of the current and the 


potential electrodes). Depending on some parameters such as 
electrode layout and topography, apparent resistivity values 
can be computed. A two dimensional (2D) ERT, such as the 
one considered in this study, consists in aligning a series 
of electrodes and acquiring a large number of measurements 
based on four-electrode configuration. The apparent resistivity 
data acquired are then inverted using an inversion software 
to reconstruct a complete 2D-section of electrical resistivity 
(Q-m). In this work, we used the Res2Dinv procedure (ver 
3.71.118) [20]. 

In 2008, as part of the French ERINOH Project [21], the 
levee was instrumented using two electrode lines. Each line is 
composed of 48 electrodes with an inter-electrode spacing of 2 
meters and buried at 1.10 meters deep below the roadway. The 
embedded electrode lines are borehole resistivity cables that 
had been laid horizontally in the levee subsoil at installation 
time. Each take-out on these cables is a molded stainless 
steel cylinder with a length of 60 mm and a diameter of 12 
mm. These molded take-outs are directly in contact with the 
soil and act as electrodes without using any rods or so. In 
this study, we used the electrode profile located on the land 
side of the crest (Fig. 1-c). The acquisition was carried out 
in a Wenner-Schlumberger array configuration and three data 
points (standard deviation greater than 5%) were removed. The 
acquisition system is a Syscal Pro Switch 96 (Iris Instruments) 
multi-channel resistivity meter. The current transmission signal 
is a regular step function. A step duration (half-period) of 250 
ms was used, and each received potential measurement was 
stacked 6 times (except very few that were stacked 9 times) 
which yielded relative standard deviation values of less than 
2% for most of the 1591 data points (and up to about 10- 
15% for only a couple of data points). All ground to electrode 
contact resistances are smaller than about 2 kQ-m, enabling 
sufficient current transmission and high quality potential mea- 
surement. Indeed, received potential drops range from 12 mV 
to 1100 mV, which allows the implemented resistivity meter 
to deliver data with high signal to noise ratios. Since the ERT 
method is an integrative method and the electrodes are buried, 
we considered a 1.10 meters thick layer above the electrode 
line in the inversion process, with an associated resistivity of 
about 65 Q-m. This value was determined after a first inversion 
process, considering the average resistivity value over the 
first 50 centimeters of the subsoil. The proposed section of 
the inversion results, which will be used for the rest of our 
study, is displayed in Figure 2. We do not display inverted 
resistivity values below 12m since they cannot be merged with 
geotechnical data due to limited borehole depth. The inversion 
was carried out on 4 iterations, since the change in RMS error 
is below 0.3 % between iteration 3 and 4, considering a Ly 
norm regularization [22]. It corresponds to a robust inversion 
allowing to emphasize the contrast of resistivities between the 
geological sets but also to limit the effects of too noisy data. 
In a geoelectrical acquisition, the electrical current flowing 
in the extra-trapezium zones is a minority compared to the 
intra-trapezium zones. Even though, we use an extended model 
discretization. This choice is made because the geometry of a 
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Fig. 1. (a) Saint-Clément-des-Levées’ levee cross section displaying the geological materials and the installed monitoring devices [18]. Only electrode line 


A was used for our study. (b) Map of France. 


rectangular section makes it possible to simplify the processing 
of the data as well as their fusion. It enables to work with cells 
of identical surface at a fixed depth. In addition, this avoids 
having to work with large extra-trapezium meshes having very 
important sensitivity values. However, to take into account the 
difference in reliability of the results between intra and extra- 
trapezium cells, the resistivity imprecision values resulting 
from the inversion are taken into account during the belief 
masses attribution stage. To go further, in future works, it could 
be pertinent to integrate sensitivity values. 

Looking at the results, a more resistive upper part of about 
3 meters thick stands out. The underlying part seems more 
conductive with an area that seems even more conductive from 
horizontal position z = 46 m to x = 70 m and from vertical 
position z = 3.86 m to z = 10.7 m. 


C. Drilling cores and Particle size distribution 


Drilling cores were carried out on the levee crest, in four 
locations, displayed in Fig. l-c. These drilling were made 
down to 7.40 meters deep, using a Texoma machine producing 
10 cm diameter cores. Once returned to the laboratory, the 
cores were visually identified in order to delineate sections of 
material that could be considered as belonging to the same 
particle size class. Some samples were collected to perform 
particle size distribution analysis following the NF P94-056 
French standard [23]. The results show the existence of two 
major particle size classes according to the NF P11 300 [24] 
soil classification. The two major classes characterized are the 
fine materials (designated as “A materials”) and the sandy 
to gravelly materials with presence of fines (designated as 
“B materials’). In this study, Dmax value [23] is always 
lower than 50 mm, thus we take into account the value of 
the cumulative sieve under 80 jm to characterize A from B 


materials. When this value is greater than 35%, the geological 
material is considered to belong to A class, otherwise it is 
considered to belong to B class. The results of the particle size 
distribution tests with the associated material classes are shown 
in Figure 3, with the depth 0 m corresponding to the position 
of the buried electrodes. The horizontal black lines stand for 
the delimitation of the materials made by the visual inspection. 
These results point out that B materials seem present in the 
upper part of the section (from 0 to 3.40 m deep for borehole 
2 and 3) while finer materials A tend to be located below. 


D. Cone penetrometer test (CPT) 


The CPT method consists of pushing rods into the soil with 
a conical tip at the end at a controlled rate in order to record 
tip resistance, g. [MPa], and friction sleeve, f, [MPa], values. 
Four tests were carried out at the same locations as the drilling 
tests (Fig. l-c) using a Gouda machine with a tip of 3.6 cm 
diameter and with an acquisition rate of 10 cm, following 
the French standard NF P94-113 [25]. The tests were carried 
out on a vertical length of 8.80 m. Using the two measured 
parameters, it is possible to determine ISBT, the Soil Behavior 
Type Index proposed by Robertson [26] and presented Eq. (1): 


Ispr = (3.47 = log())? + ( 5 


With Pa = 0.1 MPa. The IJspr index provides information 
on the nature of the soil in terms of particle size class. The 
results obtained are shown in Figure 4, with the depth 0 m 
corresponding to the position of the buried electrodes. Most 
of the materials appear to be sand mixtures thanks to the 
computation of the /sgr, which appears to be in contradiction 
with the particle size distribution below 3.40 m depth (Figure 
3). 
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Fig. 2. Levee modeled resistivity longitudinal section obtained by inverting Wenner-Schlumberger apparent resistivity data acquired with electrode line A 
shown in Fig. l-a. The depth 0 m corresponds to the position of the electrodes. 
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Fig. 3. Photograph of extracted material from BH1 and results of the particle size distribution analysis with the associated soil classes (A and B) in the four 
drilling cores, locations specified in Fig. l-c. Arrows symbolize the vertical extension of the information thanks to the visual characterization of a technician. 
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Fig. 4. Isgpr vertical profiles for each CPT test and associated soil classes. 


Ill. FUSION METHODOLOGY Dempster [6]. Hence, Shafer’s theory is often referred to as 
; a the Dempster-Shafer theory (DST). This theory allows the 
A. Belief functions and combination rules computation of the belief and the plausibility of a hypothesis 


Shafer [7] introduced the BFs by developing the math- (corresponding to soil material classes in this work) from 
ematical theory of evidence inspired by earlier works of distinct sources of information (measured data). The practical 
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benefit of using BFs lies in its ability to manage information 
from different sources, associated with their respective levels 
of uncertainties and inaccuracies. In this work, we will be 
considering three sources of information: two geotechnical 
(CPT and particle size distribution) and one geophysical 
(ERT). Another feature of the BFs theory is its ability to assess 
the level of conflict (0) between sources, i.e. when the infor- 
mation given by one source is contradictory to the information 
given by another one. Following Smets [27], we consider 
that uncertainties correspond to degrees of confidence that are 
given to a value, whereas inaccuracies correspond to intervals 
of values that can be directly associated with measurement 
errors related to the investigation method. For example, the 
uncertainty on measuring the value of a geotechnical parameter 
identical to the one measured in a borehole increases with the 
distance to that borehole. The inaccuracy can for its part be 
associated with the error bar of the corresponding measured 
datum. In addition, the BFs allow taking into consideration the 
ignorance and incompleteness of the information. It is indeed 
possible to grant credit on all the possible results in order to 
quantify our ignorance, whereas the probabilistic theory would 
simply assign an equiprobability to each single hypothesis. For 
the reader eager to learn more, more details concerning the 
theory can be found in [28]. 

To define and to use the BFs, it is required (i) to set a Frame 
of Discernment (FoD), (ii) to assign belief mass values to the 
hypothesis of this set and for each source of information, (iii) 
to implement a fusion rule for merging the information; and 
(iv) to provide a representation of the combined information. 
The FoD © consists of all the possible hypothesis within 
the problem under concern. The elements of the FoD are 
exhaustive and exclusive, such as for n hypothesis: 


© = {61,00,..., On}. (2) 


In our problematic, the possible hypothesis of the FoD 
correspond to classes of geological materials that can be asso- 
ciated with intervals of values of geophysical and geotechnical 
parameters. Here, we consider that #; stands for fine-grained 
materials and @2 stands for coarser-grained materials. We also 
consider a third hypothesis 63 that will be associated with 
intervals of values of geophysical and geotechnical parameters 
that are not included in the two first sets. Thus, we can qualify 
63 as being “another” material and we have: 


e= {91, 42, 03}. (3) 


The set of all subsets of © (including the conflict hypothesis, 
(), is named as “powerset” and written 2°. In our case, we 
get: 


a = {0, 01, 02, 01 U G2, 03, 01 UO3, A2 U 03, 61 U 82 U3}. 
(4) 
The belief mass function m,; is defined for a source of 


information S; (for 7 = 1, 2 or 3 in our study) and is attributed 
to a subset X (defined on 2°) in [0,1] such that, as in the 


probability theory, the more m(X) tends to 1 and the more 
the confidence in X is important : 


S> m(X) = 1. (5) 


XE2° 


The main difference with the probability theory is that X 
can represent the union of many hypotheses. For example if 
belief mass is attributed to 0; U2, it means that either 6; OR 
4 are possible. Thus, it is possible to model uncertainty and 
lack of knowledge. Belief and plausibility functions, Bel and 
Pl respectively, are considered as upper and lower bounds of 
an unknown probability P such that for any 2°, Bel(X) < 
P(X) < Pl(X). Belief and plausibility functions are in one- 
to-one relation with the belief mass, m(-), and defined by: 


Bel(X)= S> m(Z) (6) 
ZE29|ZOX 
PUX)= So m(Z) (7) 


ZE2°|ZXA0 


In our study, we only use belief mass functions since 
fusion rules are set directly from the allocated belief masses 
from each information source. The approach developed by 
Smets [29] in his Transferable Belief Model (TBM) (i.e. 
conjunctive fusion, so called “open-world assumption”) allows 
the assignment of a belief mass to the conflict represented by 
the empty set, so that one considers: 


m1,2,...,5(0) > 0, (8) 


where ™1,2.....5(-) denotes the merged belief mass_result- 
ing from the combination of information from the different 
sources. The belief mass mi,2,...5(X) resulting from the 
conjunctive fusion of information from all sources (from 7 = 1 
to s) is written: 


S 
= I[]u%). © 
X1,Xo,...,.X3€2° j=1 
XiNXEN...AXgH=X 
with m,(X,) the belief mass respectively attributed to hypoth- 
esis X by information source 7. 
The conflict level between the s considered sources of 
information can therefore be written as: 


(10) 
X1,Xo0,..,XgE2? J=Hl 
XiNXeN...NXs=0 
According to Shafer’s approach and unlike Smets’ rule, 
DS rule does not allow the attribution of a belief mass to 
the conflict. Thus, in DST (which uses the “closed-world 
assumption’), one has by definition: 


(11) 
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The conflict mass is then reallocated through a normaliza- 


jaeage 


X1,X2,..,XgE2? j=l 
X1NXEN...AXgH=X 
(12) 


The drawback of this combination rule is that the conflict 
between the sources is no longer displayed. Furthermore, it 
is possible to obtain counterintuitive results when the conflict 
level is important. However, the PCR6 (Proportional Conflict 
Redistribution No. 6) combination rule [30] allows the redistri- 
bution of all partial conflicts, in proportion to the masses of the 


gSgenng 


mio os(X) = ma1,2,...,5(X) 
S-1 
“> 2s 
KX Kiger Xi, C2°\{X} 


k = 
(a1 Xi, \nx=0 


[mi (X) + mig (X) +... + mi, (X)] 


(i1,%2,...,4,) EPS 
: Tks mi, (X) 1 eee mi, (Xi; ) (13) 
pe Mi (x) ar a ee Mi (Xi) 
where P® is the set of all permutations of the elements 
{1,2,..., S}. It should be emphasized that X;,, Xi,,..., Xi, 
may be different from each other, equal, or some equal and 
some different, etc. In this paper, for the sake of conciseness 
and since PCR6 and DS rules provide quite similar results, we 
will only be displaying fusion results using Smets and PCR6 
rules of combination. 


B. Attribution of belief masses from geophysical data 


To attribute belief masses from electrical resistivity data, it 
is first necessary to define the limits of the resistivity intervals 
corresponding to the hypothesis of the FoD. To do so, we use 
a representation in modal classes of the number of cells of the 
2D section resulting from the inversion (Figure 2) according 
to the resistivity values represented in log scale. The accuracy, 
to two decimal places, of the classes’ values shown in Figure 
5 make little difference in the characterization methodology 
of the boundaries we use. We are aware that this precision 
is superfluous for a geophysical interpretation. What matters 
is the general trend of the values’ distribution. It enables to 
highlight the large sets of materials constitutive of the subsoil 
section. This subjective division of classes (abscissa axis, 
Figure 5) comes from the computation of the upper and lower 


bound values, considering the following geometric sequence: 
bn+1 = 1.1bp, (14) 


with b,, the lower bound of the interval n, b,+41 the upper 
bound and bop = 2 2-m. 


The representation implemented as modal classes (Figure 5), 
associated with the reading of Figure 2, suggests the presence 
of two materials of different kinds: 1) a material of lower 
resistivity (in blue, Figure 5) that can be associated with 6; 


.\ hypothesis of a fine-grained material and ii) a material with 


higher resistivity (in orange, Figure 5) that can be associated 
with the hypothesis of a coarser-grained material (62). 

The delimitation between these two classes, however, is 
not straightforward. We propose to associate the intermediate 
values of resistivities with the hypothesis 0; U 62, suggesting 
that these resistivity values can be related to the hypothesis 
“6, or 02”. Thus the bounds of the electrical resistivity classes 
(Q-m) are fixed such that: 


[16.28, 31.73] is associated with 61, 

(51.10, 99.57] is associated with 62, (15) 
]31.73, 51.10] is associated with 6; U 02, 

(5.10, 16.28[U]99.57, 312.45] is associated with 63. 


To limit the possible biases imputed to computation of 
the distance between intervals (approach detailed below), we 
consider the intervals associated with the hypothesis 63 (i.e. 
(5.10; 16.28[ and ]99.57; 312.45]) of same width in log scale 
(i.e. same ratio between upper and lower bound values) to the 
intervals associated with 6, and 62. Once the DC-resistivity 
intervals are defined, it is required to associate masses of 
belief to each of the considered hypothesis. This belief masses 
attribution process has to be carried out for each cell of 
the section mesh resulting from the inversion (Figure 2). We 
propose three alternative approaches for the attribution of the 
belief masses. 

The first one, referred as D,, consists in considering a 
Gaussian probability distribution Eq. (16) centered on the 
inverted resistivity value: 


(16) 


with j the value of inverted resistivity in the considered cell, 
o the inaccuracy provided by the inversion process resulting 
from the computation of the covariance matrix. Given that the 
area under the Gaussian distribution is equal to 1, the mass 
is assigned to the hypotheses according to the proportion of 
the area intersecting the defined resistivity intervals. This can 
only be done in accordance with Eq. (5), so that each cell is 
associated with a standardized belief mass distribution. The 
results of this approach are presented in Figure 6. Figure 6-a 
highlights the hypotheses having the highest belief mass for 
each cell section while Figure 6-b displays their associated 
belief mass values. We find that the belief masses associated 
with this approach (D,) are very large (often close to 1), 
suggesting that the ERT method is completely reliable and able 
to characterize the subsoil materials. Such level of confidence 
may seem exaggerated. Thus, we propose a second belief 
masses assignment method. 

This other approach, referred as Dw, relies on the cal- 
culation of Wasserstein’s distances [31] as previously used 
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Fig. 5. Model classes’ distribution of the cells displayed in Figure 2, according to the inverted electrical resistivity values ((Q-m) and intervals associated to 


the soil classes. 
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Fig. 6. a) Representation of the hypotheses having the highest belief mass according to the masses attribution from electrical resistivity data considering a 
Gaussian probability distribution centered on the inverted resistivity value, and b) the associated belief mass values. 


in [14], [15]. To associate the belief masses with the FoD 
hypotheses, we consider the intervals of inverted resistivity 
values with their associated inaccuracies (example in red in 
Figure 7). The belief masses are issued from the computa- 
tion of the Wasserstein distances, considering two resistivity 
intervals A = [a1, a2] and B = [b;,b2|, A being the interval 
corresponding to a defined hypothesis Eq. (15) and B being 
an interval of inverted values. We take into consideration the 
imprecision level resulting from the inversion for each cell of 
the mesh, such that b) = pp —o and bg =p+a: 
Dwe. = (ee) 7 loathils) 


1,,log(a2/a1),2 log(b2 /b 1). 2471/2 
+ sl g( a) )) +( Bla) )) | (17) 
This computation gives the Wasserstein distance between 


two intervals taking their size and the distance between 
them into account. The Wasserstein distances are computed 
between the inverted values with estimated inaccuracies, and 
the intervals associated with each hypothesis Eq. (15). In the 
example illustrated in Figure 7, the Wasserstein distance would 
be computed between [b;,b2] and each of the other intervals 
(ap, @1], [a1, a2], [a2, a3], [ag, a4] and [a4, a5]. 


Each cell is then associated with a standardized belief 
mass distribution in accordance with Eq. (5) and inversely 
proportional to the distance value. This way, the more the 
distance of an electrical resistivity interval resulting from the 
inversion is “small” to one hypothesis of the FoD, the more the 
mass of belief associated is large, and reciprocally. The results 
of this belief mass assignment approach, D,,, are displayed 
in Figure 8. The belief masses derived from this approach 
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Fig. 7. Diagram displaying the classes described in Eq. (15) with the red interval [b1; b2] corresponding to an interval of inverted electrical resistivity values 
from one cell of the 2D section of subsoil, used for Wasserstein distances’ computation. 


(D.,) are lower than those displayed in Figure 6 (D,) and 
more 6; U #2 emerges. This approach (D,,) is therefore more 
cautious. 

We finally propose the use of a third approach, referred 
as Dg, combining the first two previously described. This 
approach (D,,,,) is similar to the previous one (D,,) in that it 
considers the distribution of a mass on the defined hypotheses, 
using Wasserstein distances. However, the allocated mass is 
here equal to m = 1/2 instead of 1 as previously defined. 
The remaining mass (m = 1/2) is allocated proportionally to 
the area under the Gaussian distribution, on the hypotheses 
associated with the implied resistivity intervals, as described 
above for the first approach, D,. The results are displayed in 
Figure 9 and are intermediate to the results of the first two 
methods displayed in Figures 6 and 8. 


C. Attribution of belief masses from geotechnical data 


For the two geotechnical information sources, belief masses 
must be associated with the different hypotheses of the FoD for 
each cell of the vertical boreholes. To do so, we consider the 
geotechnical parameter values available at each depth (Figure 
10-a) with their respective associated inaccuracies. Thus, we 
obtain intervals of values as for the attribution of belief masses 
from geophysical data. We generate a mesh for each geotech- 
nical source (particle size distribution and CPT) consisting 
of as many cells in depth as the number of geotechnical 
measurements in each borehole (Figure 10-b). The cells are of 
same dimensions. At each borehole position a belief mass of 
a given value (see details below) is assigned, in the borehole 
points, to the hypothesis corresponding to the geotechnical 
parameter value. We then construct a new mesh (Figure 10- 
c), covering the full section of the subsoil, according to the 
depth of the boreholes. In order to characterize the entire 
section of the model, as does the ERT method, and to associate 
belief mass values to each newly generated cell, we impose an 
exponential lateral decay of the belief mass from the borehole 
point to the nearby one so that the decay rate is a function of 
the values proposed by the nearby borehole. Thus, we get for 


a specific depth: 
M(x) = M(0)-e~°**, (18) 


with x being the horizontal distance from the considered cell 
to the reference borehole in meters (2 = O in the borehole), 


M(a) the belief mass values assigned to each hypothesis in 
the FoD for a position x, with M/(0) the belief mass value 
assigned in the borehole. C’, corresponds to the coefficient of 
variation expressed in Eq. (19), such as used in Phoon and 
Kulhawy [32]: 


1 1 Mmesh 
Cy = 5,| ——— — Q:)?, 19 
5 eer ee Qi) (19) 


where @ is the geotechnical parameter value of the considered 
cell in the borehole and @; the geotechnical parameter values 
in the nearby borehole centered on the same depth. For Figure 
10-b, and more broadly in this study, we considered Mmesh = 3, 
so that the computation of C,, takes into account 3 cells in 
the nearby borehole. Indeed, for two consecutive boreholes 
with similar values at fixed depth, we consider the soil to 
be less variable laterally and the decay of the confidence 
to be slower than for two consecutive boreholes displaying 
drastically different values. This decrease of belief mass is 
carried out to the left and to the right, from each borehole 
point. If the belief mass associated with a hypothesis X is 
less than | (m(X) < 1), then the remainder of belief mass to 
be allocated to satisfy Eq. (5), is reallocated on the hypothesis 
“any type of geological material” symbolized by the union of 
all hypotheses, such as: 


In each cell of the section, the information on the belief 
masses coming from the borehole on the left is merged with 
the information coming from the borehole on the right, using 
the DS rule, in respect with Eq. (12). Considering the 4 
boreholes of this study, from 1 to 4 from left to right: borehole 
1 cannot be compared to any borehole to its left neither can 
borehole 4 be compared to any borehole to its right. Indeed, 
for a given depth, we consider an equal C,, for left and right 
directions for boreholes located at the beginning and at the 
end of the section. In our case, the sections to the left of the 
first borehole and to the right of the last borehole are of little 
interest since they are not covered by the ERT investigation. 
The chosen hypotheses (geological material) at the borehole 
positions for each geotechnical method are displayed in Figure 
11 and detailed below. 
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Fig. 8. a) Representation of the hypotheses having the highest belief mass according to the masses attribution based on Wasserstein distances applied to the 


inverted resistivity data (Figure 2), and b) the corresponding belief mass values. 
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Fig. 9. a) Representation of the hypotheses having the highest belief mass according to the masses attribution based on Wasserstein distances and considering 
a Gaussian probability distribution applied to the inverted resistivity data (Figure 2) and b) the corresponding belief mass values. 
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Fig. 10. Construction of a geotechnical discretization mesh from two vertical borehole acquisitions (BH1 and BH2). a) Representation of the geotechnical 
parameter values for BH1 and BH2 with depth. b) The boreholes are split in cells of same thickness associated with belief mass equal to a given value for 


the considered hypothesis. c) Construction of a full section mesh according to the depth of the boreholes. 


D. Attribution of belief masses from particle size distribution 


For the particle size distribution, materials described as 
,quoteA materials in Section II are considered as belonging 
to 6, and B materials as belonging to 02. The inaccuracies 
taken into account correspond to 0.1% of the weighed value as 
indicated in the French standard NF P94-056 [23]. If the value 
of the cumulated sieve under 80 zm cannot be characterized 


as being greater or less than 35%, taking into account the 
inaccuracies, then the selected hypothesis is considered to be 
9 U@2. This points out our inability to choose. Where materials 
have not been collected from the core for analysis (black 
areas in Figure 3), we consider that we have no information. 
Therefore, the belief mass is attributed to the union of all 
hypotheses: 6; U@2U@s3, which represents the highest possible 
uncertainty (lowest knowledge level). 
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Fig. 11. 2D representation of the levee section displaying borehole positions in dashed lines and associated [spr (white dotted line) and particle size distribution 


(white solid line) corresponding classes. 


In the boreholes, we consider a belief mass m(-) = 0.99 on 
the characterized hypothesis, at the depths for which the soil 
samples were analyzed and a belief mass of m/(-) = 0.01 
on the union of all other hypotheses, in accordance with 
Eq.(5). A mass of M(0) = 0.99 was chosen because the 
established hypotheses are soil particle size distribution classes 
by definition. Thus, we consider that particle size information 
is the most appropriate kind of information that one can obtain. 
However, a mass of M(0) = 1, has not been set in order to 
avoid any total conflict that may arise in the fusion process. 
For depths at which the materials have not been analyzed, but 
still belonging to the same geological set (limits established 
by a geotechnical engineer and displayed in Figure 3) as the 
analyzed materials, we consider a vertical extension of the 
information. 

A vertical decrease of the confidence level associated with 
the hypothesis @; is carried out from the limit depth (p = 0 
m) of the collected sample, up to the limit between the 
two geological sets established visually by the geotechnical 
engineer (to p = 1 m in the example, Figure 12). The distance 
between these two depths is d. The vertical decay of the belief 
mass on the considered hypothesis 0;1 is expressed as follows: 


m(6i;p) = 0.99(1 — e?-%). (21) 

So that m(6;;p) =0 at the boundary between the two 
lithological sets characterized. The complementary belief mass 
is allocated to 6; U @2 U @3 in accordance with Eq. (5). 

Figure 12 shows the results of the particle size distribution 
tests carried out on Borehole 2 (Figure 3). The colored areas 
correspond to the depths at which the collected samples 
have been analyzed while the arrows symbolize the vertical 
extension of the information where the materials have not 
been analyzed. On this example, the values of p and d apply 
to the vertical extension of the information from the lower 
bound of the analyzed sample, at a depth of 2.40 m, to the 
boundary between the two materials, established visually by 
the technician, at a depth of 3.40 m. 


BH2 


Depth (m) 


Fig. 12. Representation of the vertical extension of particle size distribution 
information. Example of results from particle size distribution carried out on 
extracted materials from BH2. 


Then, the information is extended laterally, as detailed 
above, and the 2D section is cut so that it corresponds to 
the dimensions and coordinates of the ERT section (Figure 
2). The results are displayed in Figure 13, with Figure 13- 
a highlighting the hypotheses having the highest belief mass 
for each cell section while Figure 13-b is displaying their 
associated belief mass values. 


E. Attribution of belief masses from CPT 


For the characterization of geological materials by the CPT 
method, we consider them to be fine-grained materials and 
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a) Representation of the hypotheses having the highest belief mass according to the belief masses attribution from particle size distribution data 


considering 14(0) = 0.99 in the borehole points and b) the associated belief mass values. 
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Fig. 14. Considering i) M(0) = 0.25, ii) M(0) = 0.5, iii) M(0) = 0.75 and iv) M(0) = 0.99, a) representation of the hypotheses having the highest belief 
mass according to the belief masses attribution from CPT data in the borehole points and b) the associated belief mass values. 


belonging to 0; when [spy > 2.6. When Ispr < 2.6, the ma- 
terials are deemed to be coarser-grained and therefore belong 
to 02. As recommended in the NF P94-113 French standard 
[25], we consider a maximum inaccuracy on the computation 
of Ispr, the maximum tolerated inaccuracy being the smallest 
of the following values: 


e 5% of the measured value (for gq. and f;), 
e 1% of the maximum value of the measuring range (for 
dc and fs). 


If the value of Jsgr cannot be characterized as greater or 
less than 2.6, then the selected hypothesis is considered to 
be the hypothesis ; U 62, highlighting our disability to select 
a geological material. The attribution of a belief mass of 
M(0) = 0.99 at the borehole positions, used for the particle 
size distribution, is questionable for the use of the /spr index. 
Indeed, the characterization of geological sets in terms of 
particle size distribution is less reliable by the use of such an 
index. The reliability is decreased by the fact that the value 


of Ispr is obtained following a computation involving the 
two recorded parameters and also by the fact that no sample 
is extracted. Therefore, we propose a brief parametric study 
with the results of the attribution of the belief masses for the 
CPT method, using a value of M(0) = 0.25 (Figure 14-i), 
M(0) = 0.5 (Figure 14-ii), 1/(0) = 0.75 (Figure 14-iii) and 
M(0) = 0.99 (Figure 14-iv) on the hypothesis concerned at 
the borehole positions. As in the case of the particle size 
distribution, after horizontal extension of the information and 
cutting the 2D section in accordance with the dimensions and 
coordinates of the ERT section (Figure 2), we obtain the results 
shown in Figure 14. 


F. Dimensioning of the mesh prior to the fusion 


Each investigation method has its specific mesh. In order 
to merge the belief masses from the geophysical information 
source (ERT) and the two geotechnical sources (particle size 
distribution and CPT), it is necessary to have a common 
mesh containing the belief masses from the three sources 
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for each cell. We chose to consider a superimposition of the 
three discretization grids. In order to avoid alteration of the 
quality of the information, no interpolation nor extrapolation is 
carried out. We thus obtain an irregular mesh but without any 
approximation of the cells and associated values (Figure 15). 
The bounds of the x-axis (right/left) and z-axis are imposed 
by the ERT section even though the data of boreholes 1 
and 4 (out of the electrode line) were taken into account 
in the attribution of belief masses from geotechnical data (as 
displayed in section II-C). 


3 | 
Geophysical mesh 
+ 


sooo TT 


Fig. 15. Example of a geophysical mesh (in blue) and two geotechnical 
meshes (in red and in green) superimposed to provide a new irregular mesh 
used for the fusion computation and the fusion result representation. 


IV. DATA FUSION RESULTS 


The results issued from the merging of the data of the three 
considered information sources are displayed in Figures 16 
and 17. First, let us compare the results obtained using the 
two different rules of combination. Unlike the PCR6 method 
for which a normalization process was carried out (Figures 16- 
c and 17-c), the Smets rule of combination makes it possible 
to highlight the conflict zones (Figures 16-a and 17-a). Thus, 
it appears that the conflict is greater close to the boreholes, 
from the interface between 6; and 62, that seems present at 
3.40 m depth, down to the maximum depth of geotechnical 
investigation for all M(O) values and for both belief masses 
attribution approach (D,, and D,,,). This is due to the fact that 
the Jspr index essentially considers a coarse-grained material 
in the levee (Figure 14) where the particle size distribution 
(Figure 13) and the ERT (Figures 8 and 9) consider fine- 
grained materials (from about 3.40 m depth). The conflict 
level decreases when deviating from the borehole positions 
since the confidence level on the geotechnical information 
decreases with the distance to these geotechnical testing points. 
Comparatively, the influence of the geophysical information is 
gradually becoming more important. The geotechnical infor- 
mation made it possible to characterize the 6; /02 interface 
at 3.40 m depth, unlike the 2.5 m interface proposed by 
the ERT (Figures 8 and 9). This interface is also defined 
more precisely. The geotechnical information also allowed 


the discrimination between 6, and 62 materials where the 
geophysics brought out the doubt between 6; and @> (i.e. 
6; U @). Some doubts about the nature of the material, 
however, remain present between x = 0 m to 2 = 7 m. The 
areas with the highest associated confidence level are those 
close to the borehole points and between boreholes BH2 and 
BH3 between z = 3.5 m and z= 5.5 m. This is explained 
by the concordance between geophysical information (Figures 
8 and 9) and particle size information (Figure 13). Indeed, 
Figure 13 indicates a significant belief mass on 6; because 
the particle size distribution values at such depths are quite 
similar between boreholes BH2 and BH3. 


Now let us compare the results obtained with two different 
approaches of belief masses attribution for the geophysical 
source of information, D,, and D,,, (Figure 16). The belief 
masses associated with the selected hypotheses, using Di, 
approach, are more significant since this approach grants 
more mass initially (Figures 16-ii-b and ii.d). The horizontal 
interface between 6, and 6 is well characterized and the 
doubt lies on the left part of the subsoil section. Using 
Smets rule of combination, there is no significant difference 
concerning the distribution of the conflict for D,, and Dwg 
approaches (Figures 16-i.a and ii.a). In comparison with D,,, 
approach, using D,, approach, a noticeable difference lies 
in the appearance of sets of coarser-grained materials for 
PCR6. This material is identified close to the borehole points 
between z = 6 m and z = 9 m (Figure 16-i.c). This is due in 
particular to the results of the CPT (Figure 14) and to the lower 
confidence level brought by the geophysical information at 
these locations (Figure 8-b). When geophysical belief masses 
increase using the D,,g belief masses attribution approach, 0; 
material becomes dominant again and erases the presence of 
§ material. In contrast, Figure 16-ii.d displays that these areas 
remain areas of lesser confidence. 


Keeping the D,,, approach of geophysical belief masses 
attribution, let us compare the variation of (0) values 
for CPT characterization. It appears that the conflict level 
increases along with the value of 1/ (0). Indeed, for a low value 
of M(0) = 0.25 (Figure 17-i.a), there is almost no conflict 
while conflict covers nearly a third of the subsoil section for a 
high value of 1/(0) = 0.99 (Figure 17-iii.a). Using PCR6 rule 
of combination, the presence of 02 increases along with M (0) 
value (Figures 17-i.c, 16-ii.c, 17-.ii.c, 17-111.c) below 5 m depth 
and the associated levels of confidence decrease (Figures 17- 
id, 16-ii.d, 17-ii.d, 17-iii.d). This can be explained by the 
fact that the CPT characterization goes against the ERT and 
the particle size distribution characterizations. Thus, putting 
more confidence in the CPT characterization by increasing 
M (0) value can only increase the conflict level and potentially 
change the proposed characterization after normalization of 
the conflictual masses (PCR6 rule). However, these areas 
characterized as coarse-grained materials between 6 and 9 m 
depth close to boreholes BH2 and BH3 (Figures 17-ii1.c and 
iii.c) are associated with low belief mass values (Figures 17- 
iid and iii.d). 
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Fig. 16. Hypotheses having the highest belief mass in a) and c) and their associated mass values in b) and d). i) Belief mass attribution from electrical resistivity 
data considering only the computation of Wasserstein distance and ii) belief mass attribution from electrical resistivity data considering the computation of 
Wasserstein distance and also a Gaussian probability distribution centered on the inverted resistivity value. (a,b) figures are results of Smets rule of combination, 
(c,d) figures are results of PCR6. The boreholes positions are in dashed lines with M(0) = 0.5. 
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Fig. 17. Hypotheses having the highest belief mass in a) and c) and their associated mass values in b) and d), considering i) (0) = 0.25, ii) M(0) = 0.75 
and iii) /(0) = 0.99. We consider a belief mass attribution from electrical resistivity data considering the computation of Wasserstein distance and also a 
Gaussian probability distribution centered on the inverted resistivity value. (a,b) figures are results of Smets rule of combination, (c,d) figures are results of 
PCR6. The boreholes positions are in dashed lines. 
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V. DISCUSSION 


The results of our fusion methodology suggest the presence 
of two material layers. First, a coarse-grained material layer 
about 3.5 m thick, and then, a fine-grained material layer be- 
low. It is possible to associate the first layer to the embankment 
(levee) and the second layer to alluvium. These results are in 
agreement with the internal representation of the levee that 
was proposed before the investigation campaign (Figure |-a) 
[18]. The drawback of conducting an investigation campaign 
on a real levee is that there is no “true” model to which we 
can compare our results. 

Most of the conflict present in the results is related to the 
characterization made by the Jspr index from CPT data. This 
characterization essentially considers a coarse-grained material 
in almost the whole levee section, contrary to the ERT and 
the particle size analysis that dissociate quite clearly two sets 
of materials. It appears that the characterization of materials 
by the /spr index is not ideal in this case. This is why we 
consider that a mass of 1/(0) = 0.5 should be retained in 
the CPT boreholes, contrary to a mass of M(0) = 0.99 for 
granulometric analyzes. In the future, it could be relevant to 
find a more appropriate index for particle size characterization 
of materials from the parameters measured with the CPT. 

We proposed a brief parametric study bringing to light the 
results associated with four belief mass attribution values in the 
sampling boreholes (M(0)). This parametric study highlights 
the fact that increasing the value of M(0) extends the lateral 
extension of the selected hypothesis. It seems that a large 
value of 1/(0) cannot be equally attributed to both the values 
of Ispr and particle size distribution results when these two 
sources of information seem contradictory in many locations 
(Figure 11). We believe that for a real investigation campaign, 
the value associated with each geotechnical method and at 
each borehole point could be adjusted by a well-informed 
geotechnical engineer, relying on an elicitation process [33] 
and based on the ability of the method to provide information 
on the nature of the investigated material. 

In this study, the water table height and its time variations 
were ignored. However, we believe that this choice has no 
significant impact on the proposed results. First, water level 
variations are very low on the dates of the investigation 
campaign [18], so there should be no problem in considering 
the data as if they had been acquired at the same time. 
Secondly, the water table height is more than 8 m below 
the surface, where fine-grained materials are deemed to be 
present (low resistivities). The hydraulic conductivity of these 
soils (alluvium) is probably very low and their hydric state is 
potentially quite insensitive to seasonal changes of the water 
table. 

However, for other cases, it may be relevant to take into 
account the water table level and its variations (especially for a 
long-term monitoring of the levee and for winter investigation 
campaigns). To do this, it would be interesting to consider 
more hypotheses within the FoD ©. For example, the hypoth- 
esis 9; could be associated to a fine material in the dry state 


and the hypothesis 62 to this same material in a saturated state. 
The number of hypotheses constitutive of © is not limited 
and does not require any modification of our methodology. 
However, the computational cost could vary significantly. 

Recently, many equipment to use ERT for long-term mon- 
itoring have been developed [34], [35]). Thus, it would be 
interesting to combine such use of ERT with this fusion 
methodology. As soon as the fusion parameters are fixed, the 
use of the methodology could be automated in order to propose 
a daily update of a levee section for a long-term monitoring 
system. It would then be a matter of integrating data from time- 
domain reflectometry moisture sensor as well as piezometric 
surveys, making it possible to monitor the fluctuation of the 
water table as a function of rainfalls and periods of irrigation. It 
is possible to draw inspiration from works such as [36] which 
focus on the effects of environmental perturbations (variations 
in water table heights, temperatures and rainfalls) and propose 
calibration curves from ERT data for a specific investigation 
site. 

For this study, we considered data from three investigation 
methods, usually used for levee characterization [37], [38], 
[39], each one having its own belief mass attribution method. 
A limitation of our methodology is our ability to define the 
values of the upper and lower bounds of the physical parame- 
ters characterizing the constitutive hypotheses of the FoD for 
each investigation method. We currently rely on bibliographic 
references [26], standards [24] or visual distribution of the 
data (Figure 5). Our ability to define these interval values 
is a function of the type of investigation method. If these 
intervals are poorly defined, the fusion results may not be able 
to display any coherent solution. Automating the procedure 
of characterization of the intervals of the FoD would be a 
great contribution to this methodology (e.g., by means of data 
mining [40]). 


VI. CONCLUSION 


In this work, we present a novel fusion methodology based 
on the use of belief functions to optimize the combination of 
data from three different sources of information composed of 
a geophysical method (ERT) and two geotechnical methods 
(CPT and particle size distribution) applied to levee character- 
ization. These sources of information are considered comple- 
mentary, each one having its own spatial distribution and as- 
sociated level of uncertainties and inaccuracies. This method- 
ology was tested on real datasets acquired on an earthen 
fluvial levee located in St-Clément-des-Levées (France) and 
has proved to be efficient. We compared the results obtained 
with different geophysical belief mass attribution approaches 
(Wasserstein distance computation and Gaussian probability 
distribution centered on the inverted resistivity value), as well 
as for different belief mass values in the boreholes for CPT 
investigations and different combination rules (Smets and 
PCR6). 

A representation of the merged information associated with 
degrees of belief was proposed. We advocate that this rep- 
resentation is more relevant and informative than a simple 
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superposition of the different physical parameters recorded. 
The ability of our methodology to distinguish two geological 
sets of different nature in terms of particle size distribution 
(fine-grained material and coarser-grained material) as well 
as its ability to accurately characterize a horizontal interface 
(at about 3.40 m depth below the electrode line) was demon- 
strated. A coarser-grained geological material was identified at 
the top of the structure and a fine-grained material is present 
underneath. The characterization seems more reliable between 
boreholes 2 and 3 and the results highlight a doubt about 
the nature of the material at the left part of the section. Two 
zones of lesser confidence are also located in the lower part of 
the section (below 7.5 m depth) near the geotechnical points 
intersecting the ERT section. 

Finally, in the proposed results (Figures 16 and 17), the 
areas of lesser confidence level indicate where the investigation 
could be strengthened. Moreover, the conflict zones inform 
us where at least two sources of information disagree. These 
two types of outcome are believed to be precious for a real 
investigation campaign. The level of confidence would also 
be important for decision support (e.g., models of failure 
hazards). 

This methodology would be perfectly transferable to other 
applications (landslide, pedology, pollutant tracking, mining...) 
when at least two sources of information (geotechnical and 
geophysical) are involved [41]. 
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Abstract—A reliable lithological characterization of earthen 
dikes constitutes an important asset by virtue of enhancing a 
good diagnosis, which is of immense value in preventing dike 
breakage. Ruptures of hydraulic works can lead to disastrous 
consequences (loss of life, severe environmental and economic 
impacts). Recognized methodologies for characterizing earthen 
dikes include complementary geophysical and geotechnical in- 
vestigation methods. This article explores a fusion methodology 
to combine data from these two types of information sources 
in considering actual datasets from a canal dike investigation 
campaign. This campaign involves electrical resistivity tomogra- 
phy as well as a multi-channel analysis of surface waves and a 
particle size analysis derived from laboratory testing. Our fusion 
methodology is based on the use of belief masses to enhance 
the characterization of lithological sets within earthen structures. 
While taking into consideration the particularities of each method 
(spatial distribution, data imperfection), this approach provides 
information on the conflict level between information sources 
and moreover displays a confidence index associated with the 
results. This work contributes several improvements to the 
fusion methodology (including the fusion of two distinct geo- 
physical datasets and the implementation of K-means clustering 
algorithms) in addition to new application possibilities (larger 
area of investigation, more complex structure and lithological 
variability). It also offers fusion results and dike characterization 
whether considering zero, four or seven boreholes. Fusion results 
highlight the ability of this enhanced methodology to identify the 
position of lithological sets (fine and coarse fill materials with 
limestone breccia, marls and limestones) as well as specify the 
interface positions and associated levels of confidence, ensuring 
consistency with available knowledge on the geological setting and 
presence of a fault. These results also display good consistency 
between the geoelectrical and seismic characterizations for this 
specific investigation site despite the inability to characterize each 
material individually. 


Keywords: Canal dike, Belief functions, Data fusion, Electri- 
cal Resistivity Tomography, Multi-channel analysis of surface 
waves, Particle size analysis. 


I. INTRODUCTION 


Hydraulic works such as river and canal dikes are built to 
maintain a given flow of water. To prevent eventual breakage 
of these works, which could lead to catastrophic events (i.e. 
casualties, property damage, economic impacts), an effective 
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characterization of these complex structures is required. For 
this purpose, investigation campaigns involving geophysical 
and geotechnical methods [1] are typically employed for dike 
characterization and weak zone identification. The objective 
of these campaigns is to generate a hazard assessment. 

Whereas geophysical methods are non-intrusive and pro- 
vide information on a large volume of subsoil, geotechnical 
methods are intrusive and yield detailed spatial information. In 
addition, the quality of the information derived from the two 
methods differs. The uncertainties associated with geophysical 
information are significant, especially owing to the integrative 
and indirect aspects of the method as well as to the resolution 
step for inverse problems [2]. On the other hand, the informa- 
tion acquired by geotechnical methods is much more reliable 
due to a direct contact with the material. It is therefore implied 
that these two types of techniques are mutually beneficial. 
Unfortunately, too few of the methodologies available actually 
consider a mathematical combination of the acquired data, 
opting instead for a simple graphic superposition of the results 
[3], [4], [5]. 

In order to facilitate the diagnosis of hydraulic embank- 
ments, it is important to identify the lithological materials 
present within the structure and distinguish them. The inter- 
faces as well as the presence of any anomalies must also be 
located. This information is in fact likely to provide indications 
on the subsequent development of internal erosion zones or 
areas of physical instability [6]. Such a characterization, asso- 
ciated with confidence indices, would be useful in producing 
failure hazard models. 

An information fusion methodology based on the use of 
belief functions [7], [8] was developed and proposed to merge 
data from geophysical and geotechnical information sources 
in [9]. Furthermore, bibliographic research on earthen dike 
properties and failure modes are available in [10], and para- 
metric studies on several parameters involved in the fusion 
methodology are available in [11], [12], [13]. Conclusive 
studies were also carried out for numerical models as well 
as on an experimental test bench in [14] and then on an actual 
fluvial earthen dike in [15]. This methodology is unique by 
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virtue of taking into account the various types of imperfec- 
tions associated with information (uncertainty, imprecision, 
incompleteness), as well as the respective spatial expressions 
of information and representations of the inconsistency be- 
tween information sources (investigation methods). It could 
be applied well beyond dike characterization (e.g. liquefaction 
risk, landslide, pedology, pollutant tracking, mining) when at 
least two sources are involved [16]. In the field of geosciences, 
some research works have applied the use of belief functions 
to propose results for: slope instability [17], [18], groundwater 
[19], and flood susceptibility mapping [20]. Also, many works 
on joint inversion [21], [22], [23], [24] have proposed utiliz- 
ing large amounts of geophysical information through joint 
inversion in order to avoid some of the ambiguity inherent in 
the methods when applied individually. However, these works 
differ from our approach, which considers the information 
sources to be mutually independent. 

We are proposing herein the application of this belief 
function-based methodology to a new type of structure, namely 
a canal dike owned by the French EDF electric utility com- 
pany. This structure includes the presence of two distinct sub- 
stratum materials, two fill materials and a fault. In comparison 
with earlier research, this study features numerous advances. 
First, the research area is substantially larger than previously: 
around 1,800 meters long by 24 meters high, as compared 
to 100 meters long by 12 meters high in the Dezert et al. 
study [15]. The structure under investigation is also more 
complicated, offering more lithological variation. In addition 
to the ERT (Electrical Resistivity Tomography) approach, the 
MASW (Multi-channel Analysis of Surface Waves) method 
is employed as a second source of geophysical information. 
Also taken into account are the sensitivities associated with 
resistivity values, thus indicating the extent to which a change 
in resistivity will influence the potential measured by the array. 
Moreover, the possibility of associating physical parameter 
values with lithological materials has been integrated through 
use of the K-means clustering method. The purpose here is to 
automate a procedure that had previously been conducted by 
means of simple expert opinion. 

This article is organized as follows. First, the studied 
canal dike and geology will be introduced along with the 
three investigation methods considered (ERT, MASW, core 
drillings). Then, the fusion methodology will be described 
by use of the two combination rules (Smets and Proportional 
Conflict Redistribution Rule No. 6), with an analysis of the 
data acquired for each method. This section will also present 
clustering and the belief masses computation. In the next 
section, the belief mass results for each individual information 
source as well as the overall fusion results will be dis- 
played. The fusion process is operated by initially considering 
only the geophysical information sources and then adding 
the geotechnical information source in two situations: one 
including the information from four boreholes, and the other 
composed of seven boreholes. Lastly, the results of this work 
will be discussed in terms of their advantages, limitations and 
perspectives. 


II. INVESTIGATED DIKE AND INVESTIGATION METHODS 
A. Hydraulic embankment and geological context 


The studied hydraulic embankment is a canal dike owned 
by the EDF electric utility company and located in the 
south of France. Since for reasons of confidentiality it is 
not authorized to disclose the precise geographic location of 
this canal dike, the Kilometric Point (KP) notation is being 
used to identify the geophysical and geotechnical investigation 
positions. These KPs (denoted in kilometers) correspond to the 
dike length, along the crest moving from the upstream part 
to the downstream part. The stretch selected for our study 
is located on the right bank and extends from KP 10.35 to 
KP 12.13. Five geological formations have been identified 
throughout the whole structure. For the present study, this 
particular section has the benefit of intersecting two distinct 
geological formations, with the presence of a fault oriented 
NE-SW that lowers the western compartment. Up to KP 10.80 
approximately, the canal is essentially rock-based on more 
or less marly limestone terrain from the Lower Cretaceous 
(shown in yellow, Figure 1). Beyond that point, the substratum 
is generally formed of more or less clayey and indurated marls 
from the Oligocene (purple, Figure 1). Between KP 11.50 
and KP 12.13, the presence of more cohesive materials, most 
likely corresponding to Cretaceous limestones, is suspected. 
The respective positions of the three distinct investigation 
methods implemented (two geophysical and one geotechnical) 
are displayed in Figure 1. 


B. Electrical Resistivity Tomography 


The basic principle behind DC-resistivity methods consists 
of transmitting direct electrical current (DC) of known inten- 
sity [A] by means of two “current” electrodes and then measur- 
ing the voltage drop [V] between two “potential” electrodes. 
Depending on parameters such as electrode layout and acqui- 
sition array, apparent resistivity values can be computed. Such 
a measurement yields indirect information though due to the 
integrative aspect of this geophysical method. Some available 
forward modeling software can integrate the electrode layout 
(inter-electrode spacing) and topography in order to simulate 
a measurement (e.g. COMSOL Multiphysics, BERT, CESAR- 
LCPC). Marescot [25] recalled that apparent resistivity is the 
measured transfer resistance divided by the simulated transfer 
resistance of a model (with topography), at a resistivity of | 
Q-m. A two-dimensional (2D) ERT, like the one considered 
in this work, consists of an alignment of electrodes and an 
inversion of a large number of measurements based on a four- 
electrode configuration (two current electrodes for electrical 
current injection and two potential electrodes to measure an 
electric potential drop). 

A geoelectric campaign was performed in 2014 on the dike 
crest. The array consisted of 48 electrodes with an inter- 
electrode spacing of 5 meters and a dipole-dipole configuration 
array set up with an ABEM Terrameter LS resistivity meter. 
Cables were rolled so that the information covered the entire 
stretch of dike with a constant theoretical depth of inves- 
tigation and without any blind areas. All measured contact 
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Fig. 1. Locations of the geotechnical borings and geophysical profiles on the crest of the studied dike section. The depths of contact between fill materials 


and substratum at each borehole are displayed on top. 


resistances were below 5 k{2. This electrode layout extended 
from KP 10.35 to KP 12.13 (blue profile, Figure 1). 

The apparent resistivity data acquired (Figure 2a) were then 
inverted, in considering an L1 norm regularization on both the 
data and the model [26] and using an inversion software [27] 
to reconstruct a complete 2D-section of electrical resistivity 
[Q:m] compatible with the data. This step corresponded to a 
robust inversion that allowed accentuating resistivity contrast 
between lithological formations, in addition to limiting the 
effects of overly noisy data. This work uses the Res2Dinv 
software (version 3.71.118) [28]. 

For the inversion process, a flat topography is considered 
since the elevation variation is negligible all along the dike 
crest (only 11 cm). To simplify the processing of data as 
well as their fusion, the extended model option in Res2Dinv 
has been used. This option extends the model cells’ vertical 
division to the edges of the survey line. However, to account 
for the difference in reliability of the resistivity values between 
meshes located at the center, bottom or sides, the resistivity 
imprecision values [28] resulting from the inversion are taken 
into account during the belief mass attribution stage. Moreover, 
in this new work, sensitivity values [28] are considered in 
the procedure. The sensitivity function indicates the extent to 
which a change in resistivity of a section of the subsurface 
influences the potential measured by the array. The higher the 
value of the sensitivity function, the greater the influence of 
the subsurface region on the measurement. Mathematically, the 
sensitivity function is given by the Fréchet derivatives [29], as 
detailed in [28]. 


C. Multi-channel Analysis of Surface Waves 


The MASW method consists of studying the dispersion of 
seismic surface waves (waveform deformation) in order to 
determine the shear wave velocity. As described in [30], use 
of this method comprises three stages: (i) data acquisition, (11) 
determination of the Rayleigh dispersion curve, and (iii) the 
inversion process with a determination of shear wave veloci- 
ties. The seismic campaign using MASW was carried out in 
2017. A device with towed streamers shooting every 24 m was 


activated, and the acquisition was performed with a Geode 
device from Geometric, allowing for the characterization of 
three sections within the considered dike area (Figure 1). 
These three sections (yellow lines, Figure 1) extend respec- 
tively from KP 10.74 to KP 10.9, KP 11.1 to KP 11.3 and 
KP 11.5 to KP 11.7, with a geophone spacing of 2 meters. 
The MASW method assumes a laterally invariant medium. 
This assumption is verified by comparing the dispersion curves 
obtained for the recordings of the direct and reverse shots, 
as well as for the recordings of ambient vibrations. A non- 
horizontally layered medium will generally produce different 
dispersion curves on both the forward and reverse shots. 


Velocity profiles are computed by means of: i) selecting one 
dispersion curve per section, and ii) carrying out an inversion 
to obtain a V, profile that correctly explains the data using the 
Surfseis 5.0 software from the Kansas Geological Survey [31]. 
Unlike the electrical resistivity data, no imprecision values 
associated with the shear wave velocities are found after the 
inversion. This issue remains complex and one that has yet to 
reach a consensus opinion in the community [32], [33]. 


D. Core drillings with particle size analysis testing 


This work considers the geotechnical information from 
seven core drillings carried out on the crest of the studied 
area in 2016, five of which being located on the ERT profile 
(shown in red, Figure 1). These core drillings feature variable 
investigation depths (from 13 to 24 m), unlike those considered 
in [15]. The vertical resolution of information contained in 
the core is very fine (0.1 m). Particle size analysis tests 
were conducted in the laboratory on a large portion of the 
extracted samples; moreover, the classification outlined in 
French standard AFNOR NF P 11-300 [34] has been applied. 
The cohesive materials unable to undergo particle size analysis 
have been identified thanks to technician observations. Taking 
these lithological basements into account constitutes an inno- 
vation in the methodology and was not previously introduced 
in [15]. 
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III. DATA ANALYSIS AND PROCESSING 
A. Acquired and processed data 


1) Electrical Resistivity Tomography: The inverse model 
resistivity section, which will be used throughout this paper, 
is displayed in Figure 2c, and both the measured and calculated 
apparent resistivity (a) pseudo-sections are shown respectively 
in Figures 2a and 2b. The inversion was carried out over 4 iter- 
ations, yielding a final RMS error of 5.3%. A greater number 
of iterations tend to overstructure, geologically speaking, our 
imaging result, as explained by Descloitre et al. [35]. Figure 3 
highlights the strong correlation existing in our study between 
measured and calculated apparent resistivity values. 

The resistivity values (Figure 2c) suggest the local presence 
of resistive materials: from 8 to 16 m deep between KP 10.4 
and KP 10.52; near the crest surface from KP 10.55 to KP 
10.7; from 8 to 24 m deep between KP 10.8 and KP 10.95; and 
from 6 to 14 m deep at KP 12 to KP 12.1. Lower resistivities 
are observed over an area extending more than 500 meters in 
length from KP 11.1 to KP 11.65. The exact positions of the 
interfaces between the lithological sets cannot be accurately 
determined. 


co complet 0513.bin 
Pe 
2 soaso_1ovss 10520 10605 10e80 10776 108s tesse ties 11316 1101 tr26 1971 nnast_nasee_niga7_niria_ zs nnae2_ niger 2082 


! 
Ay 
ws 
nd 
a 
2a 
a3 
mia) 

ction 


Weaeured Apparent Resistivit 


Pe. 
2 rosso _rovss 10520 10605 


1or7e tose togye_ttest_ 11116 trz01 stag ingtt ines vasa near 17121797 gee ger 12052 


OINOR UD oN 


‘calculated fppars 
Iteration 4 RHS of 


Depth igaso tovas.1" 105: 
og 


ty201 19206 971 ana? aisu2 isa ti7ia_ 11797 tesa 1967 12052 


Fig. 2. Resistivities of the longitudinal dike section displayed in Figure 1: 
a) measured apparent resistivities, b) calculated apparent resistivities and c) 
model resistivity section by inverting dipole-dipole apparent resistivity data. 
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Fig. 3. Measured apparent resistivity vs. calculated apparent resistivity from 
data acquired on the ERT profile displayed in Figure 1. 


2) Multi-channel Analysis of Surface Waves: From the 
data acquisition, the Rayleigh dispersion curves can be used 


to plot the phase velocities vs. frequency. The maximum 
amplitude values are then picked from such a plot (Figure 
4). After verifying the laterally invariant medium assumption, 
20 seismic velocity profiles (V, in m-s~‘) were obtained after 
an inversion process from the picked values of the dispersion 
curve. Each profile was representative of a 24-m long section 
with variable depths and a vertical discretization of 0.1 m. 
These velocity profiles are displayed in Figure 5; they all 
indicate lower velocities near the surface and increasing values 
at depths below 10 meters for the first (KP 10.74 to KP 10.9) 
and third (KP 11.5 to KP 11.7) sections. The second section 
(KP 11.1 to KP 11.3) primarily involves low V, values. 
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Fig. 4. Example of a Rayleigh dispersion curve with the maximum extracted 
amplitude values. This curve corresponds to data acquired on the profile 
extending from KP 11.572 to KP 11.596, with an offset from the source 
equal to 54 m. 


Fig. 5. The 20 Vs; profiles acquired by applying the MASW method within 
the studied dike area. 


3) Core drillings with particle size analysis testing: The 
core drillings provide information on the presence of both fill 
materials (fine or coarse-grained) and the basement (marls or 
limestones). Figure 6 shows the presence of fine fill materials 
near the surface from boreholes B3 to B7 as well as marls 
below for the B4 and B5 drillings. Limestone is identified 
below the fill materials in boreholes B1, B2 and B6. The 
lithological basement seems particularly high in B2. Let’s note 
that the B3, B4 and B6 core drillings are located in MASW 
investigation areas. 


B. Belief functions and combination rules 


Belief functions (BFs) were introduced by Shafer [7] in 
the mathematical theory of evidence inspired by the works 
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Fig. 6. Representative diagram of the studied dike section, with the positions 
of the seven core drillings and the identified lithological materials. 


of Dempster [8], which is why belief function theory is 
often referred to as the Dempster-Shafer theory (DST). It 
serves to compute the belief and plausibility of a hypothesis 
(corresponding, in this article, to lithological materials) from 
various sources of information (ERT, MASW, core drillings). 

The main advantage of BFs is their ability to manage 
information from various sources, in association with their 
respective imperfections (uncertainties and imprecisions). BF 
theory is also able to assess the level of conflict between 
sources, i.e. when information given by one source is incon- 
sistent with that given by another source. According to the 
definition offered by Smets [36], it can be considered that 
uncertainties correspond to degrees of confidence related to 
a physical value, whereas imprecisions correspond to value 
intervals directly associated with measurement errors related 
to the investigation method. For example, the uncertainty of 
measuring a geotechnical parameter value identical to the one 
measured in a borehole increases with the distance to that bore- 
hole. However, imprecision may be associated with the error 
bar of the measured datum. In our methodology, given that 
uncertainties correspond to the belief masses associated with 
the various defined hypotheses, the imprecisions associated 
with each type of data will be detailed in the corresponding 
sections. 

Furthermore, BFs can take into account ignorance and 
the incompleteness of information. It is possible to grant a 
credit on all possible results (all possible types of lithological 
materials) in order to quantify our ignorance, while probability 
theory would simply assign an equiprobability to each single 
hypothesis. Martin et al. [?] provided a detailed explanation 
of BF theory for the interested reader. 

The implementation of BFs has been divided into four 
stages, namely: (i) define a Frame of Discernment (FoD), 
denoted ©; (ii) assign belief mass values to the hypotheses 
of this FoD for each information source; (iii) select and use 
a combination rule for the information fusion step; and (iv) 
provide a representation of the merged information. © consists 
of all possible hypotheses within the problem under concern. 
The elements of the FoD are exhaustive and exclusive, e.g. for 
n hypotheses: 


© = {61,00,...,On}- (1) 


For the problem under consideration in this article, the possible 


hypotheses of the FoD correspond to lithological materials 
potentially found in the studied dike section. In light of 
available knowledge, let’s set an FoD of five hypotheses, 
common for all sources, such that: 


O= {91, 92, 93, 04, 5}, (2) 


where 


e §, corresponds to fine fill materials, 

e @, corresponds to marls, 

e 93 corresponds to coarse fill materials with limestone 
breccia, 

e 04 corresponds to limestones, 

e 6s corresponds to any material different from the four 
listed above. 


Now, let’s introduce an additional hypothesis (#5 here) repre- 
senting one or more unexpected materials, in order to cover the 
entire field of possibilities. The space of belief mass functions 
is the set of all subsets of ©, written 2°. It is determined by all 
the disjunctions and the conflict between information sources 
(denoted ()), yielding in the present case: 


2° = {), 01, 02, 01 U 92, 03, 01 U 03, 02 U 03, 01 U 62 U 63, 
64,01 U 04, 02 U 04, 81 U 02 U 04, 03 U Oa, 
6, U 63 U 64, 02 U 63 U 64, 01 U 62 U 63 U 64, 
65,01 U 65, 02 U 05, 01 U 02 U Os, 03 U Os, 
6, U03 U 05, 02 U 63 U 85, 01 U 62 U 03 U 85, 
64 U 45, 01 U 04 U 85, 02 U 04 U 85, 01 U 02 U 04 U 85, 
63 U 04 U 65, 01 U 03 U 04 U 85, 02 U 03 U 04 U 85, 
0, UO U 83 U 04 U Os}. (3) 


The belief mass function m,; in [0,1] is defined for each 
information source S; (with 7 = 1,2 or 3 in this study) and 
attributed to a subset X (defined on 2°). Like in probability 
theory, the more m,;(X) tends to 1 the higher the confidence 
in X. Furthermore, the definition of a belief mass function 
implies that the sum of the masses (over all subsets) of a 
given source of information equals 1: 


S$ m;(X) =1. (4) 


XE22° 


The essential difference with probability theory is that X can 
represent the union of two or more hypotheses. For example, 
if a belief mass is allocated to 6; U6, this means that either 0; 
OR @ is a possible solution, thus making it possible to model 
uncertainty and lack of knowledge. Belief and plausibility 
functions, Bel and PI respectively, are considered as upper 
and lower bounds of an unknown probability P such that 
for any X € 2°, Bel(X) < P(X) < PI(X). Belief and 
plausibility functions are in a one-to-one relationship with the 
belief mass function, m(-), and defined by: 


Bel(X)= 5° m(Z), (5) 
ZOX 
ZE22 
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S- m(Z). (6) 
ZX=0 
ZE2° 

However, in our methodology, only belief mass functions 
have been used since the combination rules are set directly 
from the allocated belief masses for each information source. 
Two combination rules have been applied herein. First, Smets 
[38], in the Transferable Belief Model, enables the assignment 
of a belief mass to the conflict, 0, such that: 


m1,2,...,5(0) > 0, (7) 


gees 


geeey 


conjunctive fusion of information from all sources (from 7 = 1 
to S) is written: 


S 
Sof (%;). (8) 


Xi. Xge2? J=1 
X1N...NXgH=X 


m1,2,...,5(X) = 


The level of inconsistency between the S information 
sources is then expressed as: 


S 
So [J m(%)). (9) 


m,(X,;) stands for the belief mass assigned to hypothesis 
X, by information source 7. The second combination rule to 
be applied is the Proportional Conflict Redistribution Rule No. 
6 (PCR6) [39]. According to Shafer’s approach and unlike 
Smets’ rule, the PCR6 rule does not allow assigning any belief 
mass to the conflict. Thus, in PCR6, one has by definition: 


(10) 


Hence, PCR6 allows for the reallocation of all partial 
conflicts, in proportion to the masses of the subset concerned 
by these conflicts, so that the specificity of the information 


rT} 


gerry 


S-1 
m9 os(X) =m1,2,..,8(X) + >> > 
k=1 Xi Xigs-- Xi, €2°\{X} 


k 
ja Xi; =0 


[mi,(X) +... + mi, (X)] 
(41 ,%2,.-.,24) EPS ({1,...,2}) 
. Miz (Xx) +1 Mi, (X)mina, (Xings) 11 Mig (Xs) 
mi, (X)+...+ mi, (X) a een ee 


where P°({1,...,2}) is the set of all permutations of ele- 
ments {1,2,...,S}. It should be emphasized that X;,, Xi,, 
..., Xi, may be different from each other, equal, or some 
equal and some different, etc. 

For example, let’s consider two hypotheses A and B, 
with 2° = {0,A,B,AU B} and two information sources, 


such that m (A) = 0.6 and m2(B) = 0.3. With PCR6, the 
partial conflicting mass m(A)m2(B) = 0.6-0.3 = 0.18 is 
redistributed to A and B only with respect to the following 
proportions: x4 = 0.12 and xg = 0.06 because: 

LA LB 


More numerical examples along these lines can be found in 
[39]. 


C. Geophysical definition of the FoD using the K-means 
clustering classification 


Once the geophysical data have been acquired, the next step 
is to determine the belief mass distributions associated with the 
various hypotheses of the FoD ©. These sets of belief masses 
are specific to each information source and associated with 
each grid cell representative of the dike section. Before the 
fusion stage, the methodology indeed requires all information 
sources to have sets of belief masses defined on the same 
section and on a common mesh. The dimensions of this section 
are to be fixed by the source covering the largest area, which 
here would be the ERT method. 

The hypotheses of the FoD (Eq. 2) must be associated with 
physical values (electrical resistivity, shear wave velocity). 
Previously, in Dezert et al. [14], [15], a representation of 
the distribution of geophysical values, in the form of modal 
classes, was employed. Such a representation enables high- 
lighting the minima and maxima, under expert interpretation, 
that would be used to set the bounds of the intervals associated 
with the FoD hypotheses. Obviously, the general trend in the 
value distribution serves to identify the lithological material 
groups making up the studied section. 

A new procedure is being proposed herein to determine 
the geophysical parameter intervals associated with the FoD 
hypotheses, based on K-means clustering [40]. This clustering 
method allows classifying the geophysical parameter values 
(electrical resistivity and shear wave velocity, respectively) 
into / clusters, as derived following the resolution of a 
combinatorial optimization problem. The /A-means clustering 
algorithm is an iterative minimization of the sum of distances 
between each data point and a fixed centroid (initialization). 
This algorithm modifies the affiliation of the points of each 
cluster until the sum of the distances can no longer decrease 
(least squares method), resulting in a set of compact and 
delimited clusters. The use of this type of clustering algorithm 
rather than another seems relevant for our problem set-up since 
our data do not display clearly delimited sets; however, it 
remains easily possible for the user to indicate the desired 
number of clusters. 

Even though clustering methods are typically performed on 
multi-variate parameters [41], AK-means clustering cannot be 
run on co-located resistivity and seismic velocity data in our 
methodology since the information sources are assumed to be 
independent in belief function theory and unable to interact 
with each other. Such use of A-means clustering would belie 
that definition. Therefore, the geophysical methods are to be 
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considered individually, with clustering applied on individual 
parameters. 

The number of clusters is a subjective choice that relies 
on an interpretation of the geophysical acquisitions (Figures 2 
and 5) and modal class distribution of the physical parameters 
(Figures 7 and 8). A more objective manner would be to 
use existing indices, such as the Davies-Bouldin index, to 
fix the number of clusters A’. However, such an index would 
require a large number of clusters that may prove impossible 
to associate with specific hypotheses. Our opinion is that 
a reading and interpretation of the results from geophysical 
acquisitions is still valuable to fixing the number of clusters 
to be associated with the FoD hypotheses. This /-means 
clustering classification then serves as an aid in selecting 
precise values for the physical intervals. 

1) Electrical Resistivity Tomography: From an observation 
of Figure 2c, as previously described, three sets of resistivities 
emerge. It has been decided to associate low resistivity values 
with materials 0; U@2 (more conductive materials: fine-grained 
fill or marl basement) and then high resistivity values with 
materials 63 U 04 (more resistive materials: coarse-grained fill 
or limestone basement). Since the intermediate values do not 
provide information on the exact nature of the lithological 
material, they will be associated with the union of the four 
hypotheses, 6; U @2 U 3 U 64.The creation of three clusters by 
means of the A’-means clustering classification technique will 
therefore be considered. 

A modal class representation of the number of cells of the 
2D-ERT section (Figure 2c), with respect to the resistivity 
values depicted in log scale, is displayed in Figure 7. This 
figure highlights the clustering proposed by the /-means 
algorithm for the three defined clusters. The intervals of values 
associated with the FoD hypothesis for the ERT method are 
thus defined in Q-m for the characterization as: 

e [2.5,75] associated with 6; U 62 (fine-grained fill or 
marl), 

e [354,104] associated with @3U 64 (coarse-grained fill 
with breccia or limestone), 

e |75, 354] associated with 6; U 62 U 03 U 64 (one of the 
four materials described), 

e (0.1, 2.5[UJ104, 3 - 10°] associated with 0; (none of the 

previously described materials). 

By definition, the intervals associated with the 05 hypothesis 
(other lithological materials) do not contain any resistivity 
value present in the section. However, the association of a 
resistivity interval with this hypothesis is needed in order 
to provide it with a “physical reality” in terms of resistivity 
and compute the associated belief masses to #5. As described 
hereafter, the computation of belief masses associated with 
each hypothesis does require the computation of distances 
between physical intervals of values. 

2) Multi-channel Analysis of Surface Waves: On the seis- 
mic velocity profiles available (Figure 5), like for the ERT 
method, three sets emerge: low, intermediate, and high shear 
wave velocities. Low velocities are associated with finer 
materials 0; U 02 and high velocities with coarser materials 


03 U 64. Since the intermediate values of V,; do not provide 
information on the exact nature of the lithological material, 
they can be associated with the union of the four hypotheses 
0, U 03 U 03 U 04. Low speeds are initially associated with less 
cohesive materials (fine and coarse fill) and higher speeds 
with more cohesive materials (marl and limestone). However, 
this characterization of FoD generated a significant conflict 
between the source of seismic information and other infor- 
mation sources. The FoD characterization used herein agrees 
more closely with both the ERT characterization and particle 
size analysis. The high velocities associated with 63 could be 
attributed to the numerous cohesive limestone breccia present 
in the coarse-grained fill materials. 


Figure 8 displays the same type of representation as that 
proposed in Figure 7; it highlights the clustering proposed 
for the three defined clusters. The velocity value intervals 
associated with the FoD hypothesis for the MASW method 
are therefore defined in m- s~+ as: 


e [180,450] associated with 0; U 02 (fine-grained fill or 

marl), 

e [670,1.3-10%] associated with 63 U 64 (coarse-grained 

fill with breccia or limestone), 

e ]450,670[ associated with 0; U 02 U 63 U 04 (one of the 

four materials described), 

e [1,180[U]1.3-103,2- 10°] associated with 6; (none of 
the previously described materials). 
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Fig. 7. Modal class distribution of electrical resistivities in the form of three 
clusters using the AK’-means clustering classification and the positions of the 
respective centroids. 
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Fig. 8. Modal class distribution of shear wave velocities divided into three 
clusters using the A’-means clustering classification and the positions of the 
respective centroids. 
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D. Belief mass computations from single sources of informa- 
tion 

1) Electrical Resistivity Tomography: Once the FoD, 0, 
has been characterized with resistivity values, belief masses 
need to be assigned to each hypothesis of 2°, for each 
of the common meshes. It is therefore necessary, from the 
standpoint of a resistivity value, to assign a distribution of 
belief masses on all hypotheses. This step entails taking into 
account the imprecision on the inverted resistivities, stemming 
from inaccuracies and sensitivities provided by the inversion 
process. More specifically, the imprecision on resistivity values 
equals the ratio of the inaccuracy to the sensitivity, expressed 
as a percent. Thus, instead of considering a simple resistivity 
value, it is possible to consider an interval of resistivities 
with a lower bound (resistivity value minus its associated 
imprecision) and an upper bound (calculated resistivity plus 
its associated imprecision), whose central value is the inverted 
resistivity value. Thus, the greater the imprecision, the wider 
the interval obtained. 

Once the interval has been determined, the ascribed belief 
mass is calculated as a function of the “distance” between this 
interval and the intervals associated with the FoD hypothesis. 
These distances are computed by considering “Wasserstein dis- 
tances” [42], whereby the shorter the distance of a resistivity 
interval to an FoD hypothesis, the greater the belief mass and 
vice-versa. This procedure is explained in detail in [9] as well 
as in [15]. 

Belief masses are assigned on all hypotheses with defined 
resistivity intervals (here 0; U 62, 03 U 04, 0, U62U 63 U 04 
and 05), as the other masses are set to 0. In accordance with the 
definition of a belief mass function, the sum of these masses 
equals | (Eq. 4). Each mesh is thus associated with normalized 
belief mass functions. 

2) Multi-channel Analysis of Surface Waves: Like for the 
ERT method, once the FoD, ©, has been characterized with 
Vs values, belief masses must be assigned to each hypothesis 
of 2°, for each cell (sized 24 x 0.1 m?). It is therefore 
necessary, from the standpoint of a shear wave velocity value 
associated with a cell, to assign a belief mass distribution on 
all hypotheses. The process is identical to that introduced for 
the ERT. However, unlike the ERT method, imprecision values 
associated with the available shear wave velocities are not 
available. Hence, imprecision values are randomly simulated 
[43] according to a normal distribution (for a mean imprecision 
value of 10% and associated standard deviation of 5%). 

Belief masses are thus assigned to all hypotheses with 
defined shear wave velocity intervals (here 6; U 62, 63 U 64, 
0, U0, U 03 U 64 and 65), with the other masses being set to 
0. As mentioned above, by virtue of the definition of a belief 
mass function, the sum of these masses equals | (Eq. 4). 

The dike section covered by the ERT method is larger 
than that covered by the MASW method. However, the ERT 
section dimensions serve as a reference in our methodology. 
As such, belief mass distributions need to be proposed for the 
second source of geophysical information (i.e. MASW), where 
no seismic velocity information is available. For this step, a 


belief mass of m2(01 U 62 U 03 U 04 U 65) = 1 is chosen. This 
mass represents the “complete” uncertainty, in recognition of 


the absence of information in those areas not covered by the 
MASW campaign. 


3) Core analysis: Like for the two geophysical methods 
previously discussed, it is necessary, for each mesh of the 
section, to associate the results of particle size analyses or 
other geotechnical observations with the FoD hypothesis, 0. 
Unlike the ERT and MASW methods, the materials extracted 
by coring serve to discriminate the four hypotheses 0), 62, 63 
and 64. Imprecisions associated with weighing of the materials 
are taken into account in the procedure but do not alter our 
ability to discriminate the hypotheses. 


As mentioned above, the classification of French standard 
AFNOR NF P 11-300 [34] is used to distinguish fine-grained 
from coarse-grained fill materials. Moreover, this standard 
allows associating fine-grained fill materials with hypothesis 
6, and coarse-grained fill materials with hypothesis 63. The 
cohesive materials, i.e. marly and limestone basements, that 
have not undergone particle size analysis are respectively 
associated with hypothesis 62 (marl) and 64 (limestone) thanks 
to the visual characterization carried out by the technicians. 

A belief mass close to 1, i.e. m3(-) = 0.99, associated with 
the characterized hypothesis is fixed at the sampling points. 
In a complementary manner, a mass of m3(-) = 0.01 is then 
attributed to the union of all hypotheses (8; U@2 U@3 U64U9s5), 
in agreement with the definition of the belief mass function, 
which requires that the sum of the belief masses assigned by 
an information source equal | (Eq. 4). The other belief masses 
are all set to 0. 

This mass value of 0.99 can theoretically be modified 
depending on the ability of the geotechnical method to char- 
acterize the lithological material under investigation. Given 
our belief that core extraction and observation along with 
particle analysis constitute the best means for characterizing 
the lithology of a material, the high level of confidence in 
boreholes finds its justification. If the information provided 
by the geotechnical observation is missing or incomplete 
and no particle size analysis has been carried out, then the 
entire belief mass is allocated to the absolute uncertainty such 
that m3(01 U 62 U 63 U 64 U 85) = 1 (e.g. B6 borehole depth 
between 13 and 15 m, Figure 6). 

Core drillings provide spatially specific information com- 
pared to the output of the ERT and MASW methods, both of 
which cover larger areas. Since the dimensions of the ERT 
section serve as a geometrical reference in our methodology, 
belief masses must be assigned to the geotechnical source of 
information where no core drilling is carried out. A discretiza- 
tion of the whole section is thus performed to cover the entire 
inter-borehole space. 

An exponential decrease of the belief masses is imposed 
laterally from each borehole point to the adjacent borehole, to 
both the left and right. The rate of decrease is a function of 
two parameters: i) the lateral decay coefficient /, and ii) the 
coefficient of variation of particle size values C,. As a result, 
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for a given depth, we obtain: 


M(x) = 0.99. e~*Ce=, (12) 


with x being the horizontal distance from the considered 
mesh to the reference borehole, in meters (with x = O in 
the borehole), M(x) the belief mass values assigned to each 
hypothesis in the FoD for a position x, with 0.99 as the belief 
mass value assigned to the hypothesis identified in borehole 
(M(0)). 

The coefficient k; is set by the user of the methodology and 
depends on the lateral variability of the investigated medium. 
The value of k should increase with subsoil variability. A para- 
metric study on the influence of this lateral decay coefficient 
has been proposed in [14], in suggesting a value of k = 0.1. 
The coefficient of variation of particle size C,, is computed for 
a considered borehole, at a fixed depth, by using values from 
the borehole as well as from the adjacent borehole. As such, 
for two consecutive boreholes displaying similar particle size 
values at the same depth, the decrease in confidence is smaller 
than for two consecutive boreholes with radically different 
values. The expression of C,, is shown in Eq. 13. 


(13) 


where Q is the geotechnical parameter value of the considered 
cell in the borehole (cumulative sieve less than 80 pm), and Q; 
the value in the adjacent borehole centered at the same depth. 
This study considers Nmesh = 7, So that the computation of C, 
takes into account 7 cells in the adjacent borehole (i.e. 70 cm 
thick when assuming a vertical resolution of 10 cm). 

Consequently, in our studied section, boreholes B1 and B7 
are of interest, even though they are absent from the ERT 
profile (Figure 2c). These two boreholes make it possible to 
compute the coefficients of variation and therefore the decrease 
rate of B2 belief masses to the left and the decrease rate 
of B6 belief masses to the right. For a given cell, when 
the belief mass associated with a hypothesis is less than 1, 
the remainder of the mass to be allocated is assigned to the 
“any material” hypothesis (@; U 62 U 03 U 64 U 5). Beyond 
the maximum depth of geotechnical investigation, since no 
information is available, m3(61 U 02 U 03 U 64 U @5) = 1. 

When the materials are characterized visually by a techni- 
cian but not analyzed in the laboratory, the C’, value cannot 
be computed due to the absence of particle size data. In this 
case, two C,, values are set by the user of the methodology, 
i.e.: 1) a low value making it possible to extend the information 
widely when two identical materials are present at the same 
depth for adjacent boreholes (e.g. between 10 and 12 m deep, 
to the right of borehole B4, Figure 6); and ii) a high value 
serving to limit the extension of information locally when 
two different materials are present at the same depths for two 
adjacent boreholes (e.g. between 10 and 12 m deep to the left 
of borehole B4, Figure 6). 

The geotechnical information being extended from a bore- 
hole to adjacent boreholes implies double information at the 


level of the inter-borehole meshes (information originating 
from both the left and right). In order to have just one 
distribution of geotechnical belief masses in each cell, this 
double information is processed into a single one. For this 
step, the information originating from the left is considered as 
an initial source while that from the right as a second source. 
A preliminary fusion process is then conducted between these 
two belief mass distributions using the PCR6 rule (Eq. 11), in 
each cell of the section. 


IV. RESULTS 


Before displaying the fusion results, since belief mass 
distributions have been computed for all individual sources, 
they will first be displayed in Figures 9 to 12. It is essential 
to mention that this fusion process can take place between all 
information sources or else by considering them in pairs. This 
work thus presents the results for the fusion of information 
acquired solely by the geophysical ERT and MASW methods 
(Figure 13) and then by all three methods (ERT, MASW 
and core drillings) in considering respectively four boreholes 
(Figure 14) and seven boreholes (Figure 15). 


A. Belief mass distributions for individual information sources 


1) Electrical Resistivity Tomography: Let’s start by a dis- 
play consisting of two complementary figures (9a and 9b) 
that highlight two distinct types of information. Figure 9a 
shows, for each cell of the ERT section, the hypotheses of 
© having the greatest belief mass, while the associated mass 
values are provided in Figure 9b. The presentation of these 
results serves to highlight the ability of our methodology to 
represent the uncertainty (union of hypotheses with associated 
confidence indices) while taking imprecision into account. The 
materials most plausibly present in the dike section according 
to the ERT method are thus depicted in Figure 9a, with the 
associated confidence index in Figure 9b. The four areas of 
high resistivities described in Figure 2c appear in green in 
Figure 9a. 


6, (limestone) 


4, (fine fill materials) @, (marl) 6;(coarse fill materials with limestone breccia) 


0, U8, 0; U0, 0, U0, UO UO, 


Fig. 9. a) lithological section of the dike based on the hypotheses having the 
highest belief mass according to the mass attribution from electrical resistivity 
data, with an FoD characterization by means of clustering; b) dike section of 
the mass values associated with the hypotheses shown in a). 


The high uncertainty (low belief masses) is due to a lack of 
real measured data in the extended ERT model, as well as to 
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the resolution of ERT data intrinsically decreasing with depth. 
This decrease in confidence with depth could not be demon- 
strated in previous works [14], [15] since sensitivity values 
had not been taken into consideration. Figure 9 indicates that 
the characterization of the section using the ERT method fails 
to discriminate the four hypotheses (fine fill, coarse fill, marly 
basement, limestone) featuring common resistivity values. 


2) Multi-channel Analysis of Surface Waves: The litholog- 
ical materials most plausibly present in the subsoil according 
to the MASW method are displayed in Figure 10a, along with 
the associated confidence index in Figure 10b. It appears that 
these results are in close agreement with the characterization 
proposed by the ERT method in Figure 9, in particular with 
both the characterization of 6; U @2 at shallow depth for the 
three sections covered and the characterization of 03 U 04 
beyond a depth of 10 m for the first section, i.e. around KP 
10.8. 


6, (fine fill materials) @, (marl) @,(coarse fill materials with limestone breccia) 4, (limestone) 


0, U8, 
f 


0, U0, 6, U0, U8; U8; 0, U8, U3 UO, U Os 


Fig. 10. a) representation of the hypotheses having the highest belief mass 
according to the mass attribution from shear wave velocity data, with an FoD 
characterization by clustering; b) representation of the mass values associated 
with the hypothesis presented in a). 


3) Core analysis: The lithological materials most plausibly 
present in the subsoil according to the core drilling method 
are displayed in Figures lla and 12a, with the associated 
confidence indices in Figures 11b and 12b. Figure 11 shows 
the results when considering four boreholes (B1, B2, B5 and 
B7), while the results in Figure 12 consider all available 
boreholes (B1 through B7). 

As expected, these two figures highlight the strong confi- 
dence (belief masses close to 1) near the borehole locations 
(Figures 11b and 12b) as well as their variable lateral decrease 
depending on the materials encountered in adjacent boreholes. 
For example, when looking at borehole B4 (Figure 12), 
while the confidence associated with the hypothesis 6, (fine 
fill materials) extends widely to the right over the first 8 
meters because B5 characterizes the same material at similar 
depths, the extent of confidence in this hypothesis is much 
more restricted between z = 8 and z = 10 m, since @2 is 
characterized in BS at these depths. As for the MASW method, 
these figures substantiate the ability of the methodology to 
represent the lack of information (incompleteness). Note that 


the complete uncertainty (4; U 02 U 63 U 44 U 85) is displayed 
in black in Figures lla and 12a. 


6, (fine fill materials) 
0, 6, 05 0, 
| a O a 

B2 BS 


6, (marl) 6, (coarse fill materials with limestone breccia) 6, (limestone) 


6, U8, U8; U4, UO; 


B7 outside 
the section 


~N 


B1 outside 
the section a) 


Fig. 11. a) representation of the hypothesis with the highest belief mass 
according to the mass attribution from four core drilling data; b) representation 
of the mass values associated with the hypothesis presented in a). 


6, (fine fill materials) 
0, Oz 43 Os 
Oo a Oo oO 


B4 


4, (marl) 6(coarse fill materials with limestone breccia) 


0; U0, U0; U0, UO5 


4, (limestone) 


B7 outside 
the section 

B1 outside 

the section a) 


Fig. 12. a) representation of the hypothesis with the highest belief mass ac- 
cording to the mass attribution from seven core drilling data; b) representation 
of the mass values associated with the hypothesis presented in a). 


B. ERT and MASW fusion results 


The fusion results of ERT and MASW presented in Figures 
13a and 13c do not allow characterizing the lithological 
materials individually given the inability of either method 
(see Figures 9 and 10). Figures 13b and 13d reveal that 
the confidence level is enhanced when the same materials 
are characterized by the two geophysical methods, especially 
near the dike crest in the case of fine materials. Two conflict 
zones appear, with the larger one being located below the 3rd 
section of MASW (KP 11.5-11.7) 15 meters deep. The MASW 
method characterizes the 63; U64 hypothesis (Figure 10a) 
while the ERT method characterizes the 6; U 2 hypothesis 
(9a). On the whole, the results proposed here are very close 
to the characterization produced by ERT alone (Figure 9) since 
information is provided on a much larger area than the MASW 
method. A large part of the dike section is characterized with 
m(61 U 02 U 63 U4 U 05) =1 for the MASW method (in 
black, Figure 10a), which yields zero information. Also, the 
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masses associated with the PCR6 rule are higher and closer to 
1 (Figure 13d) than those associated with Smets’ rule (13b). 
This outcome stems from the fact that the conflict mass m(0) 
in Smets’ rule is reallocated on the other FoD hypotheses with 
the PCR6 rule, as described in Eq. (11), hence m?C”6(9) = 0. 


6, (fine fill materials) 6, (marl) 43(coarse fill materials with limestone breccia) 4, (limestone) 


HE Pele mass values 
02 04 06 08 1 


6;U0, 6,U0,U0,UQ, o o 


Fig. 13. a,c) representations of the hypotheses with the highest belief mass 
according to the mass attribution from ERT and MASW fusion using Smets’ 
and PCR6 rules; b,d) representation of the mass values associated with the 
hypotheses presented in a) and c) respectively. 


C. ERT, MASW and core drillings fusion results 


In comparison with the results displayed above, the contri- 
bution of core drillings is helpful, as they allow dissociating 
the lithological materials individually and proposing precise 
interface positions, as opposed to the smooth ones shown 
in Figure 2. It can be observed in Figures 14b, 14d, 15b 
and 15d that the conflict level decreases when deviating 
from the borehole positions since the confidence level on 
the geotechnical information decreases with distance to these 
borehole positions. Comparatively speaking, the influence of 
the geophysical information becomes more significant. 


6, (fine fill materials) 6, (marl) 6;(coarse fill materials with limestone breccia) 6, (limestone) 


BS BS 


120 14 116148 104 106 = 10811 


0 0) UA 6; 0300, 0,UGUAUA o 0 


N20 14 148 14.812 KP 
Belief mass values 
02 04 #06 O08 1 


Fig. 14. a,c) representation of the hypotheses with the highest belief mass 
according to the mass attribution from the fusion of ERT, MASW and core 
drillings (four boreholes), using the Smets’ and PCR6 rules respectively; b,d) 
representation of the mass values associated with the hypotheses presented in 
a) and c) respectively. 


Overall, the belief masses of the characterized hypothesis 
seem to be lower with three sources of information (Figures 
14b, 14d, 15b and 15d) than with two (13b and 13d). However, 
this finding does not imply that the results are of lower 
quality. Indeed, it is necessary to state that the materials 
characterized are individual hypotheses (Figures 14a, 14c, 15a 
and 15c), while a union of hypothesis was depicted in Figure 
13. Therefore, the characterization using three sources of 


information is more precise than that proposed by considering 
just the ERT and MASW methods. 

It is interesting to compare the fusion results considering 
four (Figure 14) and seven boreholes (Figure 15) by examining 
the contribution of the three additional core drillings (B3, B4, 
B6). Overall, confidence is higher with seven geotechnical 
investigation boreholes (Figures 15b and 15d) than with four 
(14b and 14d). Some conflict zones present with four bore- 
holes are reduced when integrating the three additional ones 
(e.g. on the first 10 meters, KP 10.8), yet new conflict zones 
can also appear when the particle size information contradicts 
the geophysical characterization (e.g. B3, 10 meters deep). As 
for the fusion of two information sources, the PCR6 results 
indicate higher belief masses and remove the conflict displayed 
by results using Smets’ rule. On the other hand, they also show 
that the zones where conflict is present with Smets’ rule exhibit 
low confidence with PCR6 (e.g. KP 10.8 - 11 at 10 meters in 
depth, conflict represented in Figure 15a, and associated low 
masses following use of the PCR6 rule in Figure 15d). 


6, (fine fill materials) 6, (marl) 63(coarse fill materials with limestone breccia) @, (limestone) 


B3 B4 BS B6 


120 14 11811812 KP 
Belief mass values 
02 #04 «#06 08 1 


6, 8, O,UB, 6; 4 300, 6,UG,UbUA, fo 0 


Fig. 15. a,c) representation of the hypotheses with the highest belief mass 
according to the mass attribution from the fusion of ERT, MASW and core 
drillings (seven boreholes in white, with the black lines indicating major 
contact between cohesive and non-cohesive materials), using the Smets’ and 
PCR6 rules respectively; b,d) representation of the mass values associated 
with the hypotheses presented in a) and c) respectively. 


V. DISCUSSION 
A. Lithological characterization of the dike section 


The results obtained by our fusion methodology are con- 
sistent with the geological description of the dike section 
established in the beginning of this article. Figure 15c reveals 
the presence of a limestone basement up to KP 10.6, with 
coarse fill materials above. Then, between KP 10.6 and KP 
10.8, the NE-SW fault is most likely being detected, lowering 
the western compartment. This finding explains why coarse fill 
materials are characterized at such depths at KP 10.8, whereas 
they are much closer to the surface at KP 10.55. This fault zone 
between KPs 10.55 and 10.8 is poorly characterized (union of 
hypotheses displayed in Figure 15c, and low associated belief 
masses in Figure 15d), which suggests the possibility of a dis- 
placed region. Moreover, fine fill materials are located up to a 
depth of about 10 m, with the presence of an underlying marly 
basement. From KP 11.6, the characterization is less obvious 
with certainly the alternation of finer and coarser fill materials. 
The uncertainty associated with geophysical information and 
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the lack of geotechnical information (with boreholes B6 and 
B7 being very far apart) leaves this area beyond 10 m deep 
poorly characterized. The available knowledge suggests that 
Cretaceous Limestone-type materials could be present between 
KP 11.5 and KP 12.13; this characterization thus appears to 
be consistent with our fusion results. 

These fusion results allow proposing more precise litholog- 
ical interface positions thanks to the geotechnical characteri- 
zation. In addition to being able to individually characterize 
lithological materials, it is very helpful to have areas with 
no doubt existing between two or even four materials. This 
information is valuable in order to determine where it would be 
pertinent to strengthen the geotechnical investigations. The dis- 
played results highlight the ability to take advantage of com- 
plementary information from geophysical and geotechnical 
methods. While the ERT and MASW methods did not serve to 
distinguish the four described materials in Figure 13, the core 
drilling results did clarify the section by both distinguishing 
the materials and specifying the interface locations. This step 
is performed, for example, at the level of borehole B3 (around 
11.5 m deep) between fine and coarse fill materials. While 
the ERT and MASW methods suggested a higher interface 
(Figure 13c), the information contained in the core drillings 
makes it possible to readjust this position and express this 
contradiction between sources thanks to Smets’ combination 
rule, with a representation of the conflict (in red, Figure 15a). 
This notion of conflict is valuable to understanding both the 
data and results; furthermore, it seems that no other existing 
work exposes this type of information. 

Since the conflict is not a solution in itself, the PCR6 rule 
is also essential in order to expose the material most likely 
present despite this high level of conflict. It is also required 
to observe closely and simultaneously the fusion results and 
associated belief masses. Although these mass values should 
not be considered as values serving as absolute indicators for a 
hazard study, they do provide a great deal of information when 
compared in relative terms. Thus, in Figure 15c, the presence 
of fine fill materials is observed over the first 8 meters of 
thickness, from KP 10.8 to the end of the section. However, 
Figure 15d makes it possible to qualify this characterization 
with a drastic drop in confidence to the left of borehole 
B3, as well as a gradual decrease in confidence to the right 
of borehole B6. The characterization of 9; is therefore less 
reliable at KP 12 than at KP 11.3. 

At some positions of the section, the union of two hypothe- 
ses is represented after the fusion process (e.g. 0; U 2 below 
a depth of 15 m at KP 11.2). As such, it seems impossible 
to characterize a single material, yet an expert’s observation 
should be sufficient to propose the most plausible solution. In 
this example, although borehole B4 does not extend deeper 
than 15 m, it seems rational that the materials present beyond 
are also marls and not fine fill materials. Thus, according 
to the positioning of the union of characterized hypotheses 
and thanks to expert observation, even if the geotechnical 
investigation does not extend to the base of the section, it 
is still possible to make reasonable suggestions as to the 


materials present beyond the geotechnical investigation depths. 

The fusion results presented for four (Figure 14) and 
seven core drillings (Figure 15) underscore how the section 
characterization differs depending on the number and position 
of geotechnical investigations. It may be relevant to use this 
fusion methodology during geotechnical acquisitions in order 
to ascertain where to strengthen the investigation campaign 
and where it would be valuable to acquire information. Specif- 
ically, the campaign should be reinforced as a priority where 
no material is precisely characterized (in gray, Figures 14a, 
14c, 15a and 15c) and where the confidence level associated 
with the hypotheses is low (in blue, Figures 14b, 14d, 15b and 
15d). 


B. Methodology improvements, limitations and outlook 


Compared to the previous works of Dezert et al. [14], [15], 
several improvements have been introduced in this article. 
First, the bounds of the intervals associated with the FoD 
hypotheses were previously fixed by the user under expert 
interpretation. Here, the K-means clustering method makes 
it possible to objectify these bound values. It appears that 
the results obtained are consistent with the interpretation an 
expert could issue of the geoelectrical and seismic model 
inversion results (Figures 2 and 5). It also seems that the 
choices of bound values are not aberrant when placed into 
perspective with the distributions of physical values in the form 
of modal classes (Figures 7 and 8). However, it is important 
to keep in mind that an expert’s interpretation is still essential 
to determine the number of desired clusters. In this study, 
three clusters have been set for both the ERT and MASW 
methods, but other studies may require a different number of 
clusters depending on the methods used. It is also the expert’s 
responsibility to know which FoD hypotheses to associate with 
each cluster, based on their knowledge. 

The second improvement of this work consists of having 
integrated data acquired by the MASW method. Although 
the integration of this geophysical method into the fusion 
methodology was made feasible, the information provided 
by the method for this case study is not extremely valuable 
compared to the characterization produced by the ERT method. 
On the one hand, the area covered by the seismic investigation 
is much smaller than that covered by the ERT (approx. 20% 
of the section covered), while on the other, poor complemen- 
tarity exists between these two geophysical methods when 
it comes to discriminating various hypotheses. This finding 
is certainly due to the presence of limestone blocks in the 
coarse fill materials, associating high seismic velocities in 
these materials like in the limestones from the Cretaceous. One 
point of discussion regarding the MASW method concerns 
the imprecisions associated with the calculated shear wave 
velocities. The use of random values has been proposed in this 
study, yet it would be relevant to quantify these imprecisions 
in an alternative manner. To this point however, no research 
works have seemingly reached a consensus [32], [33]. 

Another improvement pertains to the inclusion of litholog- 
ical materials extracted, yet their cohesion does not allow for 


492 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


laboratory analysis (marl and limestone). This consideration 
is valuable since it allows the geotechnical information to be 
considered more in depth than the case of focusing solely on 
fill materials. Without this consideration, several areas after 
the fusion process would still be poorly characterized, e.g. 
6, U @ or 43 U 64. The final improvement of this paper is 
the inclusion of ERT sensitivities into the mass assignment 
process, thus displaying a less confident characterization at 
the bottom and sides of the section. It would be worthwhile to 
integrate information of such a nature in the future for other 
geophysical sources (e.g. the MASW method). 

Finally, since this study focuses on a hydraulic work, 
transversal topography and the presence of water on one side 
of the studied object may have an impact on the resistivity 
values. However, these effects are not majority and relatively 
homogeneous all along the acquisition profile [44]. We are 
aware that softwares such as BERT or pyGIMLi could provide 
an advantage to investigate 3D structures. Though, in this 
work, we focused on the fusion of data classically available 
for the project manager (longitudinal ERT inverted in 2D). 
Also, for the seismic characterization, the MASW data are 
1D (almost 2D because of a longitudinal distribution). Since 
the objective of this work is to look for a 2D model of the dike, 
the inversion of ERT data in 3D would not appear consistent 
with the complete approach. 


VI. CONCLUSION 


This paper has presented several improvements to the fusion 
methodology, based on the use of belief functions first pro- 
posed by Dezert et al. [14], [15], along with a detailed appli- 
cation of the methodology to a section of canal dike with large 
dimensions and complex lithology using two combination 
rules (Smets and PCR6). This methodology serves to take into 
consideration the various forms of imperfections associated 
with information (uncertainty, imprecision, incompleteness), 
as well as the spatial expressions specific to each informa- 
tion source. The level of contradiction (conflict) between the 
information sources has also been quantified. In terms of 
methodological improvements, in addition to integrating data 
from the MASW method, the sensitivity values associated 
with electrical resistivities have now been taken into account. 
An automated procedure for associating physical values with 
lithological materials using the A-means clustering method 
has been proposed; moreover, materials extracted but not 
analyzed in the laboratory are included in the characterization 
of the studied dike section. 

The results obtained thanks to this fusion process make it 
possible to highlight the significant variability in lithology as 
well as the location of the fault between KP 10.6 and KP 
10.8. They also provide information on the position of fine 
and coarse fill materials, marls and limestones. The presence 
of a limestone substratum up to KP 10.6 with coarse fill 
materials above has been successfully characterized. The fine 
fill materials then appear to be present from the top to a 
depth of approximately 10 m, i.e. from KP 10.8 to the end of 
the section. Below 10 m of depth, while coarse fill materials 


are located at KP 10.83, the marl substratum is present in 
the center of the studied section. Below the fine fill material 
layer and beyond KP 11.6, an alternation of finer and coarser 
fill materials seems to be the most plausible characterization. 
The results displayed in the figures herein help locate areas 
of: high confidence level (high belief masses), doubt between 
two (union of two characterized hypotheses) or four materials 
(union of four characterized hypotheses), and conflict between 
the information sources (high belief masses associated with 0). 
The fine fill materials close to the surface are characterized 
with a strong confidence level, while the area where the fault 
is assumed to be present has not been well constrained. These 
results also highlight consistency between the characterization 
made by the ERT and the MASW methods at this specific 
investigation site with a very low level of conflict. 

Areas of a lower confidence level could indicate where 
investigation should be strengthened in the future and might 
also be valuable for decision support in failure hazards models. 
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Abstract—This paper presents a method of fusion of identifi- 
cation (attribute) information provided by two types of sensors: 
combined primary and secondary (IFF) surveillance radars and 
ESM (Electronic Support Measures). In the first section, the basic 
taxonomy of attribute identification is adopted in accordance with 
the standards of STANAG 1241 ed. 5 and STANAG 1241 ed. 6 
(draft). These standards provide the following basic values of 
the attribute identifications: FRIEND, HOSTILE, NEUTRAL, 
UNKNOWN and additional values: ASSUMED FRIEND and 
SUSPECT. The basis of theoretical considerations is Dezert- 
Smarandache theory (DSmT) of inference. The paper presents 
and practically uses for combining identification information 
from different ESM sensors and radars six information fusion 
rules proposed by DSmT - the Proportional Conflict Redistri- 
bution rules (PCR1, PCR2, PCR3, PCR4, PCR5 and PCR6). 
In the paper, rules of determining attribute information by 
ESM sensor equipped with the data base of radar emitters are 
presented. It was proposed that each signal vector sent by the 
ESM sensor contained an extension specifying a randomized 
identification declaration (hypothesis). This declaration specifies 
the reliability of the identification information - basic belief 
assignment (BBA) for the identification information set. The 
paper also presents a model for determining the basic belief 
assignment for a combined primary and secondary radar. Each 
sensor report sent to the fusion information center contains a 
vector of belief mass of attribute identification.Results of the PCR 
rules of sensor information combining for different scenarios of 
radio-electronic situation (deterministic and Monte Carlo) are 
presented in the final part of the paper. At the end of the paper 
conclusions are given. They confirm the legitimacy of the use of 
Dezert-Smarandache theory into information fusion for primary 
radars, secondary radars and ESM sensors. 


Keywords: information fusion, Dezert-Smarandache theory 
(DSmT) of inference, conflict redistribution rules, radar emit- 
ters recognition, electronic support measures (ESM), primary 
and secondary radars. 


I. INTRODUCTION 


The paper is devoted to the fusion of identification informa- 
tion from ESM sensors and combined primary and secondary 
radar (IFF) using the rules of Dezert-Smarandache theory 
(DSmT) called proportional conflict redistribution rules. The 
first part of the paper presents the applied interpretation of 
attribute identification in accordance with the NATO STANAG 
1241 standard. It should be noted that this is one of the 
possible interpretations of the adopted definitions. It leads to 
the Bayesian model of the basic belief assignment. 
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The identification classification method depends on the 
organization that operates the ESM sensors. In the paper, 
one assumes that the sensor identification classification is 
consistent with the NATO STANAG 1241 standard [1], [2]. In 
addition, one assumes that five identification classes are used 
- three primary and two secondary ones. Sensors can transmit 
identification information in the form of a hard decision, 
sometimes determined as non-randomized, or a soft decision, 
sometimes determined as a randomized decision. In the paper, 
one assumes that the sensors send identification information 
to the system in a randomized form, i.e. in the form of basic 
belief assignment on the set of identification classes. This 
assignment determines the sensor’s belief that the detected 
emitter belongs to separate identification classes. 

The next part of the paper presents the mathematical form 
of the DSmT conflict proportional redistribution rules PCR1, 
PCR2, PCR3, PCR4, PCRS and PCR6 [3], [4] for two sensor 
inputs and PCR5 and PCR6 for three sensor inputs, assuming 
the Bayesian model of the basic belief assignment of hypoth- 
esis. The next two sections show how to determine the basic 
belief assignment for combined primary and secondary (IFF) 
radar and ESM sensors. 

Combined primary and secondary (IFF) radars are the main 
source of identification information about air and maritime 
objects. A primary radar allows only to detect an object in a 
supervised area. The detection of the object is the precondition 
for sending a request to the object by the secondary radar (in- 
terrogator). Interpretation of the object response is dependent 
on the type of request. The so-called civilian modes allow 
only to determine whether the detected object replies to an 
interrogation or not. The paper presents a method for deter- 
mining the basic belief assignment of airborne targets moving 
in observation space of combined primary and secondary (IFF) 
radars sensor. 

ESM (electronic intelligence - electronic support measures) 
electronic surveillance sensors consist of passive receivers and 
direction-finders, which allows them to capture emitter signals 
coming from certain directions. In this way, the electronic 
recognition system can receive, among others, information 
on radar emitters mounted on air or maritime platforms. 
Reports sent from the ESM sensors include, among others, the 
characteristics of the intercepted signal, the emitter’s azimuth 
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and the so-called identification information. The paper also 
assumes that sensors are equipped with specialized databases 
called the databases of emitter signal patterns, in which 
information about previously captured, processed, analyzed, 
recognized and described radar emitter signals is stored along 
with additional information about the type and mode of the 
emitter work, the platform on which these emitters can be 
installed, and the national or organizational affiliation of these 
platforms. The detected signals are the subject of an analysis 
procedure, which allows to determine the so-called distinctive 
features of the signal and then assigning this information to 
a specific electronic entity (already existing or created ad 
hoc) [5]. The basis for assigning distinctive information to an 
electronic entity is the azimuth angle of the incoming signal. 

In the case of a high density of targets, identification 
information may fluctuate due to incorrect assignment of 
signal information to the electronic entity [6]. The impact of 
this negative phenomenon can be significantly reduced by an 
efficient estimation of the emitter positions [7]. Assuming that 
sensors send all reports on the tracked electronic entities to the 
superior operation center in the electronic recognition system, 
such a center (in the paper called the information fusion center 
(IFC) can perform the fusion function of the identification in- 
formation. The fusion of identification information ensures the 
greater stability of this information - resistance to accidental 
changes in sensor decisions. Each sensor report sent to the 
fusion information center contains a vector of belief mass for 
all attribute identification values. Results of the Proportional 
Conflict Redistribution sensor information combining rules for 
selected deterministic and Monte Carlo scenarios are presented 
in the final part of the paper. The identification information 
fusion can be realized based on three basic theories - Bayesian 
theory of inference, Dempster-Shafer theory - called the theory 
of evidence and Dezert-Smarandache theory. The methods 
of Dezert-Smarandache information fusion are used in this 
paper. In addition, their effectiveness is compared with the 
Dempster’s rule of inference. 

At the end of the paper conclusions are given. They confirm 
the legitimacy of the use of Dezert-Smarandache theory into 
information fusion for primary radars, secondary radars and 
ESM sensors. 


II. INTERPRETATION OF ATTRIBUTE IDENTIFICATION 
ACCORDING TO STANAG 1241 


The set of possible values of attribute identifications used 
by sensors can be adopted based on standardization documents 
of organizations that exploit these sensors [1], [2], [8]-[10]. 
This paper assumes a basic taxonomy of identification in 
accordance with the draft of STANAG 1241 ed. 6 [2]. To other 
similar documents one may include the following standards: 
STANAG 4420 and STANAG 1241 ed. 5, which provide the 
following basic values of the attribute identifications: 

- FRIEND (P), 

- HOSTILE (H), 

- NEUTRAL (N), 

- UNKNOWN (VU). 


Each of these documents contain their own definitions of the 
declarations. 
The following definitions of these basic values of the 


attribute identification are used in the paper (in accordance 
with [2]): 


- FRIEND - an allied/coalition military track, object or 
entity; a track, object or entity, supporting friendly forces 
and belonging to an allied/coalition nation or a declared 
or recognized friendly faction or group, 

- HOSTILE - a track, object or entity whose characteristics, 
behavior or origin indicate that it belongs to opposing 
forces or poses a threat to friendly forces or their mission, 

- NEUTRAL - a military or civilian track, object or entity, 
neither belonging to allied/coalition military forces nor to 
opposing military forces, whose characteristics, behavior, 
origin or nationality indicates that it is neither supporting 
nor opposing friendly forces or their mission, 

- UNKNOWN - an evaluated track, object or entity, which 
does not meet the criteria for any other standard identity. 


These standards bring additional values of the attribute 
identification: 


- ASSUMED FRIEND, 
- SUSPECT. 


One should pay attention on these two recent identities 
contained in [1] as well as their definitions [2]: 


- ASSUMED FRIEND - a track, object or entity which is 
assumed to be friend or neutral because of its character- 
istics, behavior or origin, 

- SUSPECT - a track, object or entity whose characteris- 
tics, behavior or origin indicate that it potentially belongs 
to opposing forces or potentially poses a threat to friendly 
forces or their mission. 


The identification definitions in [1], [2] can lead to differ- 
ent interpretations. This paper adopts the interpretation, the 
graphical form of which is shown in Figure 1. 


“xX” 
MEE 


F H 
QWiyP 


Figure 1. The interpretation of STANAG 1241 using the Venn diagram. 
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III. FUSION OF INFORMATION FROM ESM SENSORS AND 
RADARS IN THE INFORMATION FUSION CENTER (IFC) 


A. Diagram of the process of information fusion for two 
sensors in the information fusion center 


In this work, it is assumed that ESM sensors send messages 
asynchronously to the information fusion center. These reports 
contain sensor decisions regarding the identification of objects 
emitting detected signals. The set of possible identifications is 
following: 


© = {6;,i=1,...,6} (1) 


wherein the following interpretation is used: 
6, - FRIEND (F), 
6 - HOSTILE (H), 
63 - NEUTRAL (N), 
64 - ASSUMED FRIEND (AF), 
0; - SUSPECT (S), 
06 - UNKNOWN (VU). 


According to Figure 1, the hypotheses are mutually exclu- 


sive, i.e. 
0;,if = J, 
6,00; = 2 
: i: ifi Fj. io 
Each sensor with the number 7 (i € N) sends its decisions 


as so-called soft decisions, i.e. as BBA measure vectors (BBA 
- basic belief assignment) 


mM; = [m; (1), see ,m;(96)]. (3) 


One should also introduce a vector of generalized BBA 
measures for the information fusion center 


mMmpr= [mr(61),...,mr(e)]. (4) 


The paper adopts the Bayesian BBA model due to the fact 
that this model has been adopted as valid in the STANAG 4162 
standard [9]. This means that equation (5) applies in addition 
to (1) and (2). 


6 6 
S$" mr (6) = 5) mi(6;) = 1. (5) 
i=1 i=1 


The first case will be considered when two sensors send, 
asynchronously in one cycle, one report each containing deci- 
sions regarding the BBA related to the target. The IFC system, 
after receiving the report from the sensor, fuses the information 
contained in the two vectors: in the current generalized BBA 
vector mp = [mpr(61),...,m™r(06)], and in the BBA vector 
m, from sensor | or in the BBA vector mz from sensor 2. 

The information fusion procedure performed in the IFC is 
carried out in accordance with the following formula: 


m; = Re(mp,m,), (i = 1, or i = 2) (6) 


wherein m/‘, is a vector of the generalized BBA measure 
determined by the fp rule based on the previous generalized 
BBA measure vector mp and the new BBA measure vector 
m, sent by the 7-th sensor. The diagram of the identification 


IFC 
Procedure of 
ESM sensor 
and radar 
report fusion 
for two bba 
vectors 


Sensor 1 


Figure 2. The diagram of the information fusion process in the information 
fusion center IFC for two sensors. Explanations: m,; - BBA measure vector 
of i-th sensor, mp - generalized BBA measure vector that is a part of the 
electronic entity record in IFC, EER - electronic entity record in IFC database. 


information fusion from the ESM sensors is shown in Figure 
2. 

The second case will be considered when two sensors 
send, asynchronously in one cycle, one report each containing 
decisions regarding the BBA related to the target. The IFC 
system waits for reports from both sensors in one cycle, 
using registers. Only when both registers are full, the IFC 
system performs a fusion of the information contained in 
three vectors: BBA vector mp = [mpr(01),...,mr(O)], and 
BBA vector m, from sensor 1, and BBA vector my from 
sensor 2. It should be noted that this method has a drawback 
- the information stored in registers are losing credibility. 


In this case, the information fusion procedure performed 
in the IFC is carried out in accordance with the following 
formula: 


mp = Rr(mp, m1, m2) (7) 


wherein m/, is a vector of the generalized BBA measure deter- 
mined by the fp rule based on the previous generalized BBA 
measure vector mf and the new BBA measure vectors mj, 
and mg sent by both sensors. The diagram of the identification 
information fusion from the ESM sensors is shown in Figure 
3. 

In the further part of the paper, the combination rules of 
the BBA vector from the 7-th sensor and the generalized BBA 
vector in the CFI are described. 


B. The rules of combination of BBA measures vectors 


This section presents formulas defining various combination 
rules for calculating basic belief assignments for the system 
shown in Figure 2 and Figure 3. The general forms are 
described in details in [4], [11], [12]. The information fusion 
rules of the DSmT are presented below with the following 
constraints: 

- the properties of a set of hypotheses are described by the 

formulas (1) and (2), 

- for the first scheme (Figure 2), the information fusion 

procedure handles two information inputs: on one input, 
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registers IFC 
Procedure of 


nenenrs ESM sensor 

Sensor 2 and radar 
report fusion 
for three bba 


vectors 


Figure 3. The diagram of the information fusion process in the information 
fusion center IFC for two sensors and electronic entity record from IFC 
database. Explanations: m; - BBA measure vector of i-th sensor, mp - 
generalized BBA measure vector that is a part of the electronic entity record 
in IFC, EER - electronic entity record in IFC database. 


reports from two ESM sensors appear alternately, on the 
second input, electronic entity records from IFC database 
appear, 

- for the second scheme (Figure 3), the information fusion 
procedure handles three information inputs: on the first 
input, reports from a combined primary and secondary 
surveillance radar appear, on the second input, reports 
from an ESM sensor appear, on the third input, electronic 
entity records from IFC database appear. 


Dempster’s rule 


Dempster’s rule [13], [14] of the BBA measure vector m, 
sent by the 7-th sensor and the generalized BBA measure 
vector mp in IFC is described for each 6; € O by the 
following formula: 


mp (03) = mp (45) 


=e: 
= O4N01=8; 
1 — Sopat,....6 Mr (Ax): (41) 
1=1,...,6 
0,.0,=0 
- mr (8;)m(95) 
oo vin mer (9x )mi(91) 
4 6; 
= MF ( 5) (8) 
1— kr 
wherein the kp; degree of conflict is defined by the formula: 
6 6 
kei= SY) mp(9x)mi() = >> S> mr (Ox)mi(91), 9) 
k=1,...,6 k=1 [=1 
I=1,...,6 Ifk 
04,90,=0 
while 
One could notice that 
6 6 
S75 mr (Ox)mi(1) = 1. (11) 


k=1 l=1 


k=1,...,6 k=1,...,6 
1=1....,6 1=1,...,6 
On =0 OnNAAO 
6 6 6 6 
» mr(Ox)mi (01) + » x mr(6,)mi (01) = 
k=1 l=1 k=1 l=1 
xk = 
6 6 6 
S72 So me (Ox)mi (1) + S5 me (Ox)mi(Oe) =1 (12) 
k=1 a k=1 


(14) 
then there is no conflict. 


mp(.) is Dempster-Shafer fusion result if and only if the 
denominator of the expression (8) is non-zero, i.e. the degree 
of conflict ky; is less than 1. 


The Proportional Conflict Redistribution rule PCR1 


PCRI rule is the simplest and the easiest version of 
proportional conflict redistribution rule. The concept of the 
PCRI rule assumes the calculation of the total conflicting 
mass (not worrying about the partial conflicting masses). The 
total conflicting mass is redistributed to all non-empty sets of 
hypotheses proportionally with respect to their corresponding 
non-empty column sum of the associated mass matrix. The 
PCRI rule is defined for every non-empty hypothesis in the 
following way: 


mp (Oj) = mecri(9;) 


CFE 6; 
=[ > mr (Ox)m(H%)] + rik v hpi 
Fi 
k=1,...,6 
1=1,...,6 
04,6; =9; 
CKi 0; 
= mr(6;)m;(0;) + rik i) kp; 
Fi 
4 0; 
=mpi(0;) + 65) Kr; (15) 


where cp;(0,;) is the non-zero sum of the column correspond- 
ing to the hypotheses 6; in the mass matrix 


ac 


m; 


(16) 
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specified by the formula 


cri(9;) = mr(6;) + mi(9;) (17) 


where: 


- m; (? = 1,2) is a row vector of the basic belief assign- 
ments masses of the 7-th sensor’s hypotheses 6;, 

- mp is arow vector of the basic belief assignments masses 
of the IFC system’s hypotheses, 

- kp, is the degree of mass conflict specified by the formula 


6 6 
krie= a mp(Ox)mi(A1) = S > m(Ox)mi(61), 
k=1,...,6 k=1 1=1 
1=1,...,6 Izk 
6,N0,;=0 
(18) 
- dp; is the sum of all non-zero column sums of all non- 
empty sets 
6 6 
dri = \[mr(0;) + mi(6;)] = So eri(%)- (19) 
j=l j=l 
In our case dp; = 2 because 
6 6 
SS" mr(6;) = D> mi(0;) = 1. (20) 
j=l j=l 
In addition 


The Proportional Conflict Redistribution rule PCR2 


In PCR2 rule, the total conflicting mass kp; is distributed 
only to the non-empty sets involved in the conflict (not to all 
non-empty sets) and taken proportionally with respect to their 
corresponding non-empty column sum. 

A non-empty set #; € © is considered involved in the 
conflict if there exists another set 6; € O which is neither 
included in 6; nor includes 6; such that 6,96, = and 
mri(Oz6,) > 0. The PCR2 rule is defined for every non- 
empty hypothesis 0; € © in the following way: 


m'p(9;) = Mpcr2(9;) 
cri(9;) 


=[ x mr(O¢)mi(O1)] + CO) Rae 
k=1,...,6 
1=1,...,6 
O4N01=0; 
(0 
mpm HOw te, 
i(0 
— mri(9;) +C(@ ya ( i) kri (22) 
dri 
where 
C(6;) = 1, if 0; . involved in the conflict, (23) 
0, otherwise. 


Formula (23) can be written differently in the form (25), 
taking into account the definition of involvement in a conflict 
and formula (24) [12]: 


meri(O; 1 Ox) = mr (Oj) - Mi(Ge) + MF (Gx) - mi(8;) (24) 
1, if 40, € O,k AZ such that mp;(0; Ox) > 0, 

C(O;) = a 
, otherwise. 
(25) 


cri(9;) is the non-zero sum of the column corresponding to 
the hypotheses 0; in the mass matrix M (16) specified by the 
formula 


cri(9;) = mpr(;) +m,(9;), (26) 


where: 


m,; (i = 1,2) is a row vector of the basic belief assign- 

ments masses of the i-th sensor’s hypotheses 6;, 

- m+, is arow vector of the basic belief assignments masses 
of the IFC system’s hypotheses, 

- kp, is the degree of mass conflict specified by (18), 

ep; 1s the sum of all non-zero column sums of all non- 

empty sets only involved in the conflict 


eri = > [mr(0;) + mi(9;)] = ¥> SS cri(9;) 
j€CF j=l jECF 
6 6 
— C(O;) [mr (G5) + m(6;)] = y C(8;)cri(9;) 


where 
CF = {j=1,...,6: V0, € O|mpri (6; O%) > OF (28) 


and mri(0; 1 6;) is defined by (24). 
In addition 


mri(O;) = mp(O;)mi(9;) (29) 


It will be shown below that in the case of data used in 
numerical experiments (section VI) er; = 2, this means that 
the PCR2 rule is equivalent to the PCR1 rule. The BBA vectors 
used there contain values less than 1, which means that 


Vj =1,...,6:mpr(0;) <1Am;(8;)<1 (30) 


It follows that each BBA vector contains at least two non- 
zero components, that is Vj = 1,...,6, dk =1,...,6 with 
k #7 such that 


0< mpr(6;) << 1A0< mpF(6%) <1, 
0 < mj(8;) <1A0< mi(O%) < 1. 


(31) 
(32) 
From (31) and (32) it follows that if m(@;) > 0, then there 


exists at least one value & Aj such that m;(0;,) > 0, which 
can be written in the following form 


Vj =1,...,6:0< mp(6;) <1 
=> dk=1,...,6,.k49:0< mji(6,) <1. 


(33) 
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From (33) it follows that 


Vj =1,...,6:0< mp(6;) <1 

=> dk=1,...,6,.k #7: mpr(O;)mi(O,) > 0. 34) 
The same applies: 
Vj=1 .,6:0<m,(6;) <1 

=> dk=1,...,6,.k #7: mi(0;)mr(Ox) > 0. (35) 


Taking into account (34), (35) and (25) one can obtain 


Vj =1,...,6:0< mpr(6;)<1>dk=1,...,6,k 49 
such that mp(0;)mi(Ox) + mi(0;)mMr(Ox) > 0, (36) 
Vj =1,...,6:0< m,(0;) <1> dk=1,...,6,k 45 
such that mi(; \mr (Ox) + mr(6; see) >0. (37) 
From (36) and (37) it follows that 
Vj =1,...,6:0< mp(6;) <1 > C(6;) =1, (38) 
Vj =1,...,6:0< m,(6;) <1=> C(6;) =1. (39) 


This means that any hypothesis with a non-zero BBA value 
for any of the two sensors is involved in a conflict. From (27) 
it follows that 


= 3° C(0;)mr(6;) + 5° C(O;)ma(4;)- 


j= 


Using (36), (37) and (40), the value e7; will be determined. 
Because, one has 


(40) 


= 


(41) 


(42) 


we get 


6 
eri = > C(0;)mr (45) 


Considering (43), it can be said that in this case the PCR2 
rule is equivalent to the PCRI1 rule. For this reason, the 
results of the PCR2 rule are not presented in section VI, 
as they would be identical to the results of the PCRI1 rule 
because we work only with Bayesian BBAs in this application. 


6 
+ 5~ C(0;)mi(;) =2. (43) 
j=l 


The Proportional Conflict Redistribution rule PCR3 


In PCR3 rule, one distributes the partial conflicting masses, 
instead of the total conflicting mass k;, to the non-empty sets 
involved in the partial conflict. If an intersection is empty, 
for instance 6; 9 6; = 0, then the mass m(6; 1 6;) of the 


partial conflict is transferred to the non-empty sets 4; and 
9, proportionally with respect to the non-zero sum of masses 
assigned to 6; and respectively to 6; by the BBAs mp(.) and 
m,(.). The PCR3 rule works if at least one set between 0), 
and 6; is non-empty and its column sum is non-zero. 

The PCR3 rule is defined for every non-empty hypothesis 
0; € © in the following way: 


m'p(0;) = mpcr3(9;) 


=| a mr (Ox)mi(%)] 


k=1, 25556 
11, ccey 6 
0;,.0,=0; 
[cri(O )> Si ( (9;, 9%) 
“ag 
= mri(9;) + [eri(8 ) >» StS (0;,0%)] (44) 
eri 


where 


0, for cr; (6;) + cri(O,) = 0, 
oe (9;, Ox) = mp (Oxn)mi (05 J 4m (8;)en5(Ox) 


crabs reri) , otherwise. 


(45) 

cri(@;) is the non-zero sum of the column corresponding 

to the hypotheses 6; in the mass matrix M (16) specified by 
the formula 


cri(0j;) = mr(6;) + mi(9;), (46) 


The Proportional Conflict Redistribution rule PCR4 


The PCR4 rule redistributes the partial conflicting masses 
only to the sets involved in the partial conflict in proportion 
to the non-zero mass sum assigned to 6, and 6 by the 
conjunction rule according to the following formula: 


m'p(0;) = mecra(9;) 


= mpi(0;) + [mri(O > STS (6;, Ox) 
ce 
= mpi(;) + [mri(O 2 STS (6;, Ox)] 
ae 
(47) 
where 
0, for cy = 0, 
mRFil(OjNOy 
SPCR. 8.) = Se for c, ~ 0, and co £ 0, 
mrilO;N0x) 


for cy ~ 0, and cz = 0, 

(48) 
where cy a cri(6;) + cri(Ox) and c2 a mri(; )mpi(Ox), 
and wherein 


cri(0;)+cri(x)? 


mri(; M Ox) = mr (Ox)mi(4; ) + mr (6; )mi (Ox) (49) 
myri(O;) = mr (4; )m(9;) (50) 
cri(9;) = mp(0;) + mi(9;) (S1) 
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If at least one of BBAs mp(.) or m;(.) is zero, the fraction 
is discarded and the mass mp; (9; M 6%) is transferred to 6; 
and 6; proportionally with respect to their non-zero column 
sum of masses cp; (0;). 


The Proportional Conflict Redistribution rule PCRS5 for 
two BBAs (two sources) 


Similarly to PCR2—PCR4 rules, PCR5 redistributes the 
partial conflicting mass to the hypothesis involved in the partial 
conflict. PCR5 provides the most mathematically precise [4], 
[11], [12] redistribution of conflicting mass to non-empty sets 
in accordance with the logic of the conjunctive rule. However, 
it is more difficult to implementation. The PCR5 rule is defined 
for every non-empty hypothesis 6; € © in the following way: 


/ 


m'p(9;) = mpcrs(9;) 
= mpi(6;) + » SRERS (6,;, Ox) 
ee ee 
= mri(;) + 22 SI (05,0), G2) 
co 


where 


0, for cg = 0 or c4 = 0, 

mrRF(O; 2mi(O 

eae for cz #0 and c, 4 0, 
(53) 

where cz = mpr(0;) + mi(Ox) and cy = m;(0;) + mr (Ox), 

and wherein 


Si (05, 9) = 


mri(O;) = mp(O;)m,(0;). (54) 


In the formula (52), the component SCS is equal to zero 
if both denominators are equal to zero. In the formula (53), 


if a denominator is zero, then component is discarded. 


The Proportional Conflict Redistribution rules PCR5 and 
PCR6 for three BBAs (three sources) 


In [4] improved proportional conflict redistribution rules of 
combination of basic belief assignments PCR6, PCR5* and 
PCR6™ are presented. The authors point out that these rules 
should be applied if and only if we are to combine more than 
two BBAs. If we have only two BBAs to combine (s = 2) we 
always get Mpcrs = Mpcrst = MPCR6 = MpcRo+ because in 
this case the PCRS, PCR5*, PCR6, and PCR6* rules coincide. 
Below are the formulas that define the PCR5 and PCR6 rules 
for 3 BBAs. 


(55) 


wherein 
m"(;) = mp12(9;) 
+ S- Stix (03, 0x, 91) 
k=1,...,6 
1=1,...,6 
040100; =0 
+ » SUB3 (9;,9%) + » Si (8, Or) 
hie ae sae ae 
= mPri2(9;) 
on > » Sr | (0;, Ax, 61) 
Peg” aah 
a alerral 0. Ox) sone 0. Q 
» Fi ( jy? k) 2s, Fi2 ( jo k) 
ar a 
= mPri2(9;) 
» » Sip ( (0;, 9%, 91) 
k=1,...,6 1=1,..., 
we ahi 
+ SIRS (05, 9x) + S2ety (9;,9%)|, (56) 
with 
0; aye my (Ox)m2(4) 
GPCRS 6;,04,0 _ mr ( 
Fis (05,06, 91) = ay + am (0g) + mal) 
st mer (61)m1 (8; )?m2(Ox) 
mp (91) + m1 (93) + mo(9x) 
mp (6;.)m1(01)m2(9;)? (57) 
mer (Ox) + m1(61) + m2(8;)’ 
0 ) m4 (Ox )ma2(Oz¢) 
S1PCR5(9, 6,,) = _ mr (Gj)? m1 (Bx )m2(Gx) 
F12 ( j k) mp (0; ) +(x) + m2(6) 
mr (9x) + m1(8;) + m2(Ox) 
mr (Ox)m4 (0% )mM2(4;)? 
4 , (58) 
mr (Ox) + m1(0,) + m2(4;) 
0 )?m1(0;)?m2(01) 
S2PCR5 (9. 9.) = __ mr (O5)"171 (95) m2(8) 
F12 ( jo k) mr (6; ) +m (6; ) + mo(6z) 
mr (9x) (95 )?m2(0;)? 
mp(Ox) + m1(9;) Tr m2(0;) 
mr (95)? m1 (9x)m2(95)? (59) 
mp(6;) + mi (9x) + m2(0;) 
and 
mri2(6; )= mp (0; )m1 (9; )ma( 0;). (60) 


In the formulas (57)-(59), if a denominator is zero, then 
component is discarded. 

The quotient in the formula (55) ensures the normalization 
of the BBA vector m',, which ensures that 


6 6 
do mej) =D meces(9;) = 
j=l j=l 
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The PCR6 rule for three BBAs (three sources) is defined 
for every non-empty hypothesis 6; € © in the following way: 


m" (93) 


See (61) 
w=1 a 


mp (93) = mecro(O;) = 


wherein 


+ S> SPSS°(0;, O45 1) 


048110; =) 


aI » Ste (9;, 9x) 


seis 


ail 2 ia (9;, 9x) 


siting 


= mri2(9;) 


Tr a » Seis ( (0;, 9%, 1) 


Balas TL hexeg 
kj aki 


+ » SUB 
eee 


oF 2 Say ( (9;, 9x) 
6 


(9;, 9%) 


"ey 
= mFi2(6; ;) 


2 ie >> SPER... 8) 


k=1,...,6 l=1,..., 
ead eyhioe 


+ SIS (0;, 0%) + S2ea°(0;,9x)|, (62) 
with 


mr (9;)?m1 (9% )m2(91) 
mr(0;) + m1 (Ax) + m2(61) 

mp (61)m4 (9; )?m2(4;) 
mer (%) + ™m1(9;) + m2(Gx) 

mr (9x) (41)m2(;)* 


eC Oem 


Sie (9;, 9k, a) _ 


mp (0j)?m1 (9;.)m2 (Ox) 
mr (8;) + mi(Ox) + m2(Gx) 
mp (Ox) (0;)?m2(Ox) 
mp(Ox) +m1(8;) + m2(Ox) 
MP (0x )™m1 (0% )m2 (85)? 
mr(Ox%) + m1(6%) + m2(4;) 


S1 VE (8;, 9%) = 


, (64) 


S213 (8; 9%) = Ge, ee) ele) 


mer (8;) + m1 (9;) + m2(Gx) 
mr (Ox)m1 (9; )"m2(4;) 
mr (Oz) + mM 


MEF 0 j 
mer (93) + m1 (Bx) + m2(8;)’ 
mp (0; )ma (Ox )m2(9;) 
+ Tap) +11 (6x) +1220; 
and 
mri2(; — mr(0; )m4 (6; )me2( 6;). (66) 


In the formulas (63)—(65), if a denominator is zero, then 
component is discarded. The quotient in the formula (61) 
ensures the normalization of the BBA vector m',, which 
ensures that 


6 6 
So m'p(6;) = 55 meceo(9;) = 1. 
j=l j=l 


Comparing the two fusion schemes (Figures 2 and 3), it 
should be noted that sequential and global information fusion 
generally produces different results [4], i.e. 


PCR5(mpr,mi, m2) # PCR5(PCR5(mpr,m}1), m2) 


# PCR5(PCR5(mp,mz2),™m1). (67) 


In addition, the article experimentally verified the theorem 
on the inequality of the results of both PCRS and PCR6 rules 
for three BBAs (three sources) presented in [4]: 


PCR5(mpr,m1, m2) A PCR6(mpr,m1, m2). (68) 


IV. BASIC BELIEF ASSIGNMENT FOR COMBINED PRIMARY 
AND SECONDARY SURVEILLANCE RADARS 


The paper assumes that the analyzed radar sensor consists of 
two radars: primary and secondary. Therefore, the probability 
of correct detection and correct identification of a target is 
expressed by the following formula: 


Phi = PaPrrr. (69) 


where Pa is the probability of correct detection of target 
by a primary radar, and Pyrr is the probability of correct 
reply for interrogation. If a target has been detected by the 
primary radar and there is a lack of proper identification by 
the secondary radar one can assume that the target has a value 
of attribute identification of UNKNOWN - U. So, one can 
write the following relation: 


= Py(1— Prep), 


where m(U) is the mass of probability for a value of UN- 
KNOWN identification attribute. 


m(U) (70) 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


A method of calculating the probabilities Py and Prrr is 
presented in [5], [15], [16]. 

The way of allocation of the remaining mass of probability 
(1 — m(U)) will be described in this section. One assumes 
there that every simulated target should have a base value of 
attribute identification from the set 


Zar = (Np, Fe, He}, (71) 


- Np — base NEUTRAL identity, 
- Fp — base FRIEND identity, 
- Hg — base HOSTILE identity. 


STANAG 1241 introduces in addition to the basic set 
of attribute identification values also secondary (additional) 
attribute identification values: SUSPECT (S) and ASSUMED 
FRIEND (AF). According to Fig. 1 one can introduce a table 
of possible attribute values transitions between set (71) and 
the set of secondary attribute identification values: 


Zs1 = {Ns, F's, Hs, AF, S}, (72) 

Belief mass values contained in the Table I determine how 
the mass of the base belief assignment has been transformed 
into the mass of the secondary belief assignment. They can be 
estimated as empirical frequencies based on recorded archive 
events. 


Table I 
TRANSFORMATION OF THE BASE BELIEF ASSIGNMENT MASS INTO THE 
SECONDARY BELIEF ASSIGNMENT MASS 


Base identification — 


Of course, have the normalization conditions satisfied: 
Poche m(zx|F’p) = 1, en m(zx|Np) = i, and 

LeZsr m(z|Hp) =1. 

The final values of the belief mass of secondary attribute 
identification values are calculated according to the formulas: 


1) For a target with the FRIEND base value of an attribute 
identification 


m(U) => Pil = Prrr), 
m(AF) = m(AF|Fp)(1 — m(U)), 
m(Fs) = m(F's|F'g)(1 — m(U)). 


2) For a target with the NEUTRAL base value of an 
attribute identification 


(U) = Pall — Pree), 
m(AF) = m(AF|Ng)(1—m(U)), 
m(S) = m(S|Nz)(1— m(U)), 
(Ns) = (1 —m(AF|Ng) — m(S|Np))(1 — m(U)). 


3) Fora target with the HOSTILE base value of an attribute 
identification 


m(U) => Pi = Pirr), 
m(Hs) = m(Hs|Hp)(1 — m(U)), 
m($) = m(5|Hz)(1 — m(0)). 


Other final values of the belief mass of secondary attribute 
identification values are equal zero. 


V. BASIC BELIEF ASSIGNMENT FOR ESM SENSORS 


An ESM sensor is a passive sensor that captures incoming 
electromagnetic signals generated first of all by radar emitters 
mounted on air or maritime platforms. This sensor recognizes 
radar signals determining values of their distinctive features. 
In this paper we will not deal with methods of radar signals 
recognizing in details. However, we will use information about 
these methods to identify platforms generating the signals ac- 
cording to STANAG 1241 - NATO Standardization Agreement 
and Dezert-Smarandache theory. As previously it was stated, 
we are interested in three basic values of identification: friend, 
hostile and neutral, and two secondary values: suspicious and 
assumed friendly. In addition, we will assume that in some 
situations it is not possible to determine the identity of the 
emitter carrier platform. To clarify this issue, we should briefly 
describe the method of determining the identification of the 
emitter carrier platform that generated the captured signal. The 
sensor recognition system is equipped with a database that 
can be divided into three components: a platform database, 
an emitter list and a geopolitical list [10]. The platform 
database (PDB) contains information about platforms that can 
be met in the area of interest along with their equipment with 
emitters. The emitter name list (ENL) includes all emitters 
corresponding to each platform of the PDB and contains the 
values of the signal distinctive features for each emitter. The 
values of distinctive features are the basis for the procedure 
of recognizing a captured signal. The geopolitical list (GPL) 
provides the allegiance of various countries and platforms and 
allows to identify them in accordance with STANAG 1241. 

The algorithm of signal recognition is realized in two stages: 

1) Verification at the level of signal quality features. The 

second stage is executed after a positive assessment of 
the conformity of quality features. 
The signal recognition procedure determines the dis- 
tances between the distinctive features of the recognized 
signal and the distinctive features of all pattern signals 
stored within the emitter list. 


2 


Nm 


Let us introduce the following notation: 
x, - vector of distinctive features of the recognized signal, 
x; - vector of distinctive features of 7-th pattern signal (7 
— the number of the pattern signal, i € {1,..., }), 
d,; = d(xs,x;) - the distance between the distinctive 
features vector of the recognized signal and the distinctive 
features vector of 7-th pattern signal; the distance d, ; 
is the Mahalanobis distance taking into account the 
correlations of the distinctive features. 
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The signal recognition classifier compares the distance 
d(xs,X;) with the acceptable positive distance of the classifi- 
cation 6. The distance 6 is the limit that we will interpret as a 
boundary of emitter pattern recognition. We will divide the set 
of pattern signals into two subsets: the patterns satisfying the 
positive classification condition in relation to the recognized 
signal s — D¢ and the patterns that do not satisfy the positive 
classification condition — DZ. The formal definition is as 
follows: 

Dt = {ie {1,...,M}|dsi < 6} 


D, ={i€ {1,...,M}ldsi > o} 


s 


(73) 
(74) 


In the paper we propose the following method of determin- 
ing the basic belief assignment on a set of pattern signals, 
which is related to the distance between a signal and a pattern 
in the distinctive features space: 


—d(xs,Xi) (75) 


ms(i) =e 


As one can see from the formula (75) if d(xs,x;) = 0 then 
ms(i) = 1, whereas if d(xs,x;) > 0 then 0 < m,(i) <1. 
The above measure is not normalized, hence we will normalize 
it as 

iis(t) = ms(i)/ Sms (é) (76) 
i=1 
The sum of the measures assigned to all the emitters, whose 
distinctive features lie outside the limit 6, will be treated as a 
measure assigned to the base hypothesis “unknown” (U) 


ms(U) _ » Ms (4) 


i€DZ 


(77) 


To determine the belief measure of other base hypotheses 
(H, F, N) and secondary hypotheses (AF and S$), we should 
introduce formal definitions of sets contained in the sensor 
database and used for recognition of captured signals. As it 
was mentioned above the set of all the necessary data for 
platform identification can be divided into three sets: PDB — 
a platform database, ENL — an emitter name list and GPL — 
a geopolitical list: 

- PDB - the platform database contains information about 
all platforms observed in the area of interest, including 
information on all emitters mounted on each platform; 
we assume that one platform can have many emitters 
and the same type of emitters can be installed on many 
platforms; the PDB contains also information on the 
national affiliation of each platform, 

- ENL -— the emitter name list is a set of information 
about all recognized emitters in the area of interest; this 
set contains the mean values of the distinctive features 
of emitter signals (so-called signal patterns) and their 
standard deviations, 

- GPL -— the geopolitical list contains base values of 
identification attributes (H, F’, N) assigned to the various 
countries. 

We will also introduce additional notations used in this 


paper: 


- PDBL - the list of platform numbers that are stored in 
the PDB, 

- PL(<) — the set of numbers of platforms which have the 
emitter with number “7”, 

- IPL(j) — the base identification attribute of the platform 
with number “j” determined on the basis of the informa- 


tion contained in PDB and ENL (IPL(j) € {F, H, N}). 

The set of signal patterns satisfying the positive classifica- 

tion condition in relation to the recognized signal s denoted as 

D? can be divided into disjunctive subsets according to the 
values of the carrier platform identification features: 


bi =D sD s+ pM sp Spi, (78) 


D}* ND}! =0,k 41, k,l e {F,H,N,AF,S}. (79) 


Each subset of the set D{ for the base identification is 
defined as follows: 


D3" = {ie Df \Vj ¢ PL(i), IPL) =F}, (80) 
D3" = {ie D3 |Vj ¢ PL(i),IPL(j) =H}, (81) 
DIN = {ie Df |vj € PL(é), IPL) =N}. (82) 


In a similar way, one can define subsets of the set D? for 
the secondary identification (AF, S$’): 


D?}4" = {i € Dt |3j € PL(i), IPL(j) = F 


Aaj € PL(i), IPL(j) = N}, (83) 
Di? = {i € Df |Aj € PL(4), IPL(j) = H 
Aaj € PL(i), IPL(j) = N}. (84) 


One can notice we assume in this paper that no emitter type 
can be installed simultaneously on platforms with identifica- 
tions F' and H: 


{ie Dt |Aj € PL(i), IPL(j) =F 
Ajj € PL(t), IPL(j) = H} = 9. 


(85) 


It should be emphasized that the method presented here is 
different than in [6], [17]. These papers assume that ESM 
sensors can only generate basic declarations with attribute 
values FRIEND, HOSTILE and NEUTRAL but in this paper, 
we assume, that ESM sensors can generate declarations from 
an extended set of attribute values (additionally ASSUME 
FRIEND, SUSPECT and UNKNOWN). 


VI. NUMERICAL EXPERIMENTS OF FUSION OF 
IDENTIFICATION IN INFORMATION FROM ESM SENSORS 


A. Simulation scenarios 

The paper [6] presents a typical simulation scenario for 
testing the identification information fusion. The authors for- 
mulated several requirements that should be met by such a 
scenario. It should: 


1) adequately represent the known ground truth of the 
emitter identification, 
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2) include sufficient numbers of incorrect associations to 
be realistic and to test the robustness of the rules to 
temporary incorrect sensor decisions, 

3) provide only partial knowledge about the ESM sensor 
declarations, and thus contain uncertainty, 

4) allow to show stability in case of countermeasures, 

5) allow to switch identification when the ground truth 
changes. 


The authors [6] propose the following parameters of the 
scenario: 


1) ground truth of identification is FRIEND (F’) for the first 
50 iterations of the scenario and HOSTILE (#7) for the 
last 50 iterations, 

2) the number of correct associations is 80% of all iter- 
ations, the number of incorrect associations caused by 
countermeasures is 20% of all iterations in a randomly 
selected moments of time, 

3) ESM sensor declarations have a mass of 0.7 for the most 
credible identification and 0.3 for the identification of 
UNKNOWN (U). 


The assumption 5) is not considered in this paper, assuming 
that the real object does not change its real identity while 
performing the mission. Therefore, assumption 1) regarding 
scenario parameters becomes obsolete. The following assump- 
tions concerning the parameters of the scenario are made in 
this paper: 
1) the real value of identification is constant in each sce- 
nario and is equal to FRIEND (F’) - in the scenarios 1, 
2 & 5, and HOSTILE (#) - in the scenarios 3, 4 & 6; 

2) the above declarations are transmitted by sensor number 
1 with the real identification mass equal to 0.7 and 
the mass of complementary identification (UNKNOWN) 
equal to 0.3; 

3) the second sensor shall transmit its declarations in ac- 
cordance with the tables II and II for the scenarios | 
and 2 and with the tables IV and V for the scenarios 3 
and 4. 


Table II 
BELIEF MASSES FOR THE SECOND SENSOR FOR THE SCENARIOS | AND 5. 


Correct identif. (80% of events) [0.6 [01 [ 0 [02] 0 [ 0.1 | 
Incorrect identif. (20% of events) [ 0 [| 0.1 [0.6 [ 0 [ 0.2 | 0.1 | 


Table III 
BELIEF MASSES FOR THE SECOND SENSOR FOR THE SCENARIO 2. 


Correct identif. (80% of events) [0.7 [01 [ 0 [01 [ 0 [ 01 | 
Incorrect identif. (20% of events) [ 0 [ 0.1 [0.7 [ 0 [ 01 | 0.1 | 


One should note that scenario 2 differs from scenario | 
with a greater belief mass assigned to incorrect identification 


Table IV 
BELIEF MASSES FOR THE SECOND SENSOR FOR THE SCENARIO 3. 


Correct identif. (80% of events) [| 0 [01 [06] 0 [02 [ O01 | 


Incorrect identif. (20% of events) [0.6 [ 0.1 [ 0 [02 [ 0 | 0.1 | 


Table V 
BELIEF MASSES FOR THE SECOND SENSOR FOR THE SCENARIOS 4 AND 6. 


Correct identif. (80% of events) [| 0 [01 [07] 0 [01 [ O01 | 
Incorrect identif. (20% of events) [0.7 [ 0.1 [ 0 [ 0.1 [ 0 J] 0.1 | 


of the recognized emitter. The scenarios 3 and 4 are similarly 
different. 

Scenarios 1-6 for the sensor 1 have been presented in 
Figures 4 and 5. 
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Figure 4. The course of scenarios number 1, 2 and 5 for sensor 1. 
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Figure 5. The course of scenarios number 3, 4 and 6 for sensor 1. 
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All deterministic scenarios for the sensor 2 have been 1 
presented in Figures 6-9. 09 
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Figure 6. The course of scenarios number | for sensor 2. the pseudo-random integer number generator from the range 
(0, 100]. Examples of scenarios are shown in Figures 10 and 
11. 
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Figure 10. The course of Monte Carlo scenarios number 5 for sensor 2. 


Figure 7. The course of scenarios number 2 for sensor 2. 
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Figure 8. The course of scenarios number 3 for sensor 2. 


The Monte Carlo method of generating the scenario 
for the sensor 2 is also used in this paper. Moments in 
which incorrect identifications occurred are generated by 


Figure 11. The course of Monte Carlo scenario number 6 for sensor 2. 
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B. Calculation results for deterministic scenarios 


Dempster’s rule 


Dempster’s rule is not resistant to a situation when the 
degree of conflict kp; = 1. This means the total conflict 
between the mass vector sent by the sensor and the mass vector 
of the information fusion center, which occurs when each non- 
zero belief mass value sent by the sensor corresponds to zero 
belief mass value of the vector determined by the information 
fusion center and vice versa. 

The simulation results of the identification information 
fusion using Dempster’s rule have been presented for the 
deterministic scenarios | and 3 in Figures 12 and 13. 
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Figure 12. The values of the 
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Figure 13. The values of the resulting belief mass for scenario 3 and 
Dempster’s rule. 


The PCRI1 rule 


The simulation results of the identification information 
fusion using the PCRI1 rule for the deterministic scenarios 1, 
2, 3 and 4 are presented in Figures 14-17. 
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Figure 14. The values of the resulting belief mass for scenario | and the 
PCR1 rule. 
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Figure 15. The values of the resulting belief mass for scenario 2 and the 
PCRI1 rule. 
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Figure 16. The values of the resulting belief mass for scenario 3 and the 
PCRI1 rule. 
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Figure 17. The values of the resulting belief mass for scenario 4 and the 
PCRI1 rule. 


The PCR3 rule 


The simulation results of the identification information 
fusion using the PCR3 rule for the deterministic scenarios 1, 
2, 3 and 4 are presented in Figures 18-21. 
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Figure 18. The values of the resulting belief mass for scenario | and the 
PCR3 rule. 
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Figure 19. The values of the resulting belief mass for scenario 2 and the 
PCR3 rule. 
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Figure 20. The values of the resulting belief mass for scenario 3 and the 
PCR3 rule. 
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Figure 21. The values of the resulting belief mass for scenario 4 and the 
PCR3 rule. 


The PCR4 rule 


The simulation results of the identification information 
fusion using the PCR4 rule for the deterministic scenarios 1, 
2, 3 and 4 are presented in Figures 22-25. 
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Figure 22. The values of the resulting belief mass for scenario 1 and the 
PCR4 rule. 
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fusion using the PCRS rule for the deterministic scenarios 1, 
2, 3 and 4 are presented in Figures 26-29. 
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Figure 26. The values of the resulting belief mass for scenario 1 and the 
4 . ae : PCRS rule for 2 BBAs. 
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Figure 27. The values of the resulting belief mass for scenario 2 and the 
PCRS rule for 2 BBAs. 
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Figure 25. The values of the resulting belief mass for scenario 4 and the yas = Pe Se ee se 
PCR4 rule. 


Figure 28. The values of the resulting belief mass for scenario 2a and the 
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The simulation results of the identification information 
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Figure 29. The values of the resulting belief mass for scenario 2b and the 
PCRS rule for 2 BBAs. 


The PCRS rule for 3 BBAs 


The simulation results of the identification information 
fusion using the PCRS rule for 3 BBAs for the deterministic 
scenarios 1, 2, 3 and 4 are presented in Figures 30-33. 
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Figure 30. The values of the resulting belief mass for scenario | and the 
PCRS rule for 3 BBAs. 
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Figure 31. The values of the resulting belief mass for scenario 2 and the 
PCRS rule for 3 BBAs. 
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Figure 32. The values of the resulting belief mass for scenario 3 and the 
PCRS rule for 3 BBAs. 
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Figure 33. The values of the resulting belief mass for scenario 4 and the 
PCRS rule for 3 BBAs. 
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The simulation results of the identification information 
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2, 3 and 4 are presented in Figures 34—37. 
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Figure 34. The values of the resulting belief mass for scenario | and the 
PCR6 rule for 3 BBAs. 
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Figure 35. The values of the resulting belief mass for scenario 2 and the 
PCR6 rule for 3 BBAs. 
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Figure 36. The values of the resulting belief mass for scenario 3 and the 
PCR6 rule for 3 BBAs. 
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Figure 37. The values of the resulting belief mass for scenario 4 and the 
PCR6 rule for 3 BBAs. 


The presented results (Figures 12-37) allow us to conclude 
that the applied methods of managing conflicts in the infor- 
mation fusion allow to draw correct conclusions about the real 
identification of the recognized object. 
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The application of the decision threshold for the belief mass 
at the level m, = 0.37 for the PCR1 rule and m, = 0.45 for 
the PCR3, PCR4 and PCRS rules for scenarios | and 3 allows 
us to properly evaluate the identification of the recognized ob- 
ject: for scenario 1 — FRIEND and for scenario 3 - HOSTILE. 
For scenarios 2 and 4, the optimal thresholds are m. = 0.4 
for the PCRI rule and m, = 0.45 for the PCR3, PCR4 and 
PCRS rules respectively. When assessing the interval between 
the minimum resultant mass for correct identification and 
the maximum resultant mass for misidentification, the worst 
results are reached by the PCRI rule and the rules of PCR3, 
PCR4 and PCR5 behave similarly and are better than the rule 
PCRI. 

The research carried out for deterministic scenarios shows 
that the PCRS rule for 3 BBAs and the PCR6 rule for 3 BBAs 
behave very similarly. They restore the correct identification 
after the occurrence of temporary misidentification much faster 
than the rules PCR! — PCR5 for 2 BBAs. 


C. Calculation results for Monte Carlo scenarios 
Dempster’s rule 


In the Monte Carlo scenario, Dempster’s rule behaves 
similarly to a deterministic scenario. It is not resistant to a 
situation when the degree of conflict kp; = 1. This means 
the total conflict between the mass vector sent by the sensor 
and the mass vector of the information fusion center which 
occurs when each non-zero belief mass value sent by the 
sensor corresponds to zero belief mass value of the vector 
determined by the information fusion center and vice versa. 


The simulation results of the identification information 
fusion using Dempster’s rule are presented for the Monte Carlo 
scenarios 5 and 6 in Figures 38 and 39. 
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Figure 38. The values of the resulting belief mass for Monte Carlo scenario 
5 and Dempster’s rule. 
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Figure 39. The values of the resulting belief mass for Monte Carlo scenario 
6 and Dempster’s rule. 


The PCRI1 rule 


The simulation results of the identification information 
fusion using the PCRI1 rule for the Monte Carlo scenarios 
5 and 6 are presented in Figures 40 and 41. 


mass value m(.) 
oO 


0 20 40 60 80 100 120 140 160 180 200 
time step 


Figure 40. The values of the resulting belief mass for Monte Carlo scenario 
5 and the PCR1 rule. 
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Figure 41. The values of the resulting belief mass for Monte Carlo scenario 
6 and the PCRI rule. 
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The PCR3 rule 


The simulation results of the identification information 
fusion using the PCR3 rule for the Monte Carlo scenarios 
5 and 6 are presented in Figures 42 and 43. 
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Figure 42. The values of the resulting belief mass for Monte Carlo scenario 
5 and the PCR3 rule. 
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Figure 43. The values of the resulting belief mass for Monte Carlo scenario 
6 and the PCR3 rule. 


The PCR4 rule 


The simulation results of the identification information 
fusion using the PCR4 rule for the Monte Carlo scenarios 
5 and 6 are presented in Figures 44 and 45. 
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Figure 44. The values of the resulting belief mass for Monte Carlo scenario 
5 and the PCR4 rule. 
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Figure 45. The values of the resulting belief mass for Monte Carlo scenario 
6 and the PCR4 rule. 


The PCRS rule for 2 BBAs 


The simulation results of the identification information 
fusion using the PCR5 rule for 2 BBAs for the Monte Carlo 
scenarios 5 and 6 are presented in Figures 46 and 47. 


The PCRS rule for 3 BBAs 


The simulation results of the identification information 
fusion using the PCRS rule for 3 BBAs for the Monte Carlo 
scenarios 5 and 6 are presented in Figures 48 and 49. 


The PCR6 rule for 3 BBAs 


The simulation results of the identification information 
fusion using the PCR6 rule for 3 BBAs for the Monte Carlo 
scenarios 5 and 6 are presented in Figures 50 and 51. 


The presented results show that due to the high intensity 
of sending reports with incorrect identifications in the middle 
part of the scenarios, the information fusion rules (apart from 
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Figure 46. The values of the resulting belief mass for Monte Carlo scenario 
5 and the PCRS rule for 2 BBAs. 
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Figure 47. The values of the resulting belief mass for Monte Carlo scenario 
6 and the PCRS rule for 2 BBAs. 
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Figure 48. The values of the resulting belief mass for Monte Carlo scenario 
5 and the PCRS rule for 3 BBAs. 


the PCR4, PCR 5 and PCR6 rules) determine the maximum 
resulting mass for incorrect identification. The PCR5 for 3 
BBAs and PCR6 for 3 BBAs rules are the fastest to restore the 
correct identification after receiving several incorrect reports. 
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Figure 49. The values of the resulting belief mass for Monte Carlo scenario 
6 and the PCRS rule for 3 BBAs. 
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Figure 50. The values of the resulting belief mass for Monte Carlo scenario 
5 and the PCR6 rule for 3 BBAs. 
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Figure 51. The values of the resulting belief mass for Monte Carlo scenario 
6 and the PCR6 rule for 3 BBAs. 


VII. NUMERICAL EXPERIMENTS OF FUSION OF 
IDENTIFICATION IN INFORMATION FROM ESM SENSORS 
AND RADARS 


A. Numerical experiments scenarios 


We assume that we will combine attribute information from 
2 sensors: a combined primary and secondary surveillance 


radar and an ESM sensor. These sensors work asynchronously. 
Upon receipt of the sensor’s declaration in the form of a 
vector of masses, we fuse this vector with the vector of 
the actual values of the declaration masses for the fuser’s 
frame of discernment. The frequency of transmission of sensor 
declarations depends on the rules of the data exchange network 
and on the technical characteristics of the sensors. Various 
combination methods are presented in [13], [14]. In this paper, 
two of the methods of proportional redistribution conflict 
(PRCS and PCR6 [4], [11]) has been used. The numerical 
model of combined primary and secondary surveillance radars 
was taken from [5], [15]. 
Numerical experiments have been performed for the follow- 
ing data: 
- for combined primary and secondary surveillance radars 
sensor? Peg = 10-°, RE, = 100 km, P] = 0.7, of = 
2 m?, Prrr = 0.962 and the following table of masses 
(compare Table I): 


Table VI 
TRANSFORMATION OF THE BASE BELIEF ASSIGNMENT MASS INTO THE 
SECONDARY BELIEF ASSIGNMENT MASS FOR COMBINED PRIMARY AND 
SECONDARY SURVEILLANCE RADAR. 


Ss 


(Scenario #,Base identification) > 
B a a a a 


The flight path of air object was 30 km away from the 
sensor (in the horizontal plane), flight altitude was 1 km 
and radar cross-section was | sq. m. 

- for EMS sensor: 

1) the real value of identification is constant in each 
scenario and is equal to FRIEND (Ff’) — in the 
first scenario and HOSTILE (#7) - in the second 
scenario; 

2) the above declarations are transmitted by sensor 
number | with the real identification mass equal to 
0.7 and the mass of complementary identification 
(UNKNOWN) equal to 0.3; 

3) the second sensor shall transmit its declarations in 
accordance with the tables I and II for the scenarios 
1 and 2 and with the tables HI and IV for the 
scenarios 3 and 4. 

In the Tables VII—XII the mass values for all possible 
declarations for six scenarios for ESM sensor are presented 


Table VII 
BELIEF MASSES FOR THE SENSOR 2 (ESM) FOR THE SCENARIO 1. 


Correct identif. (80% of events) | 0.6 [ 0.1 | 


Poor oor 
Incorrect identif. (20% of events) [0.7 [ 0.1 [ 0 [ O01 [0 [ 0.1 | 


Scenarios 1-6 for the sensor 1 have been presented in 
Figures 52, 53 and 54. 
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Table IX 
BELIEF MASSES FOR THE SENSOR 2 (ESM) FOR THE SCENARIO 3. 
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BELIEF MASSES FOR THE SENSOR 2 (ESM) FOR THE SCENARIO 4. i : 
Figure 53. The course of scenarios number 2 and 5 for sensor 1. 
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Table XII = sina asiiiieaiaiaasinainadiaal a | 
BELIEF MASSES FOR THE SENSOR 2 (ESM) FOR THE SCENARIO 6. 0 10 20 30 40 50 60 70 80 90 ©6100 


time step 
Type of Wenicaton 
Correct identif. (80% of events) [ 0.1 | 0.7 [ 0.1 [ 0 [| 0 [ 0.1 | Figure 54. The course of scenarios number 3 and 6 for sensor 1. 
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Figure 55. The course of scenarios number | for sensor 2. 


Figure 52. The course of scenarios number land 4 for sensor 1. 


Scenarios 4—6 assume significant changes in the mass of all 
All deterministic scenarios for the sensor 2 have been declarations (large errors). 
presented in Figures 55-60. 
Scenarios 1—3 assume relatively small changes in the mass 
of all declarations_Scenarios 1-3 assume significant changes 
in the credibility mass of all declarations (small errors). The PCRS rule for 2 BBAs 


B. Calculation results for four proportional conflict redistri- 
bution rules 
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Figure 56. The course of scenarios number 2 for sensor 2. Figure 59. The course of scenarios number 5 for sensor 2. 
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Figure 57. The course of scenarios number 3 for sensor 2. . . 
Figure 60. The course of scenarios number 6 for sensor 2. 
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Figure 58. The course of scenarios number 4 for sensor 2. 


Figure 61. The values of the resulting belief mass for scenario | and the 
PCRS rule for 2 BBAs. 


The simulation results of the identification information 
fusion using the PCRS rule for 2 BBAs for the deterministic 
scenarios 1-6 are presented in Figures 61-66. scenario 4, ma = 0.35 for scenario 5, and m, = 0.5 for 
scenario 6 allows us to properly evaluate the identification 
For the PCRS rule for 2 BBAs, the application of the of the recognized object for most time moments. 
decision thresholds at the belief mass level my = 0.43 for 
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Figure 62. The values of the resulting belief mass for scenario 2 and the 
PCRS rule for 2 BBAs. 
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Figure 63. The values of the resulting belief mass for scenario 3 and the 
PCRS rule for 2 BBAs. 
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Figure 64. The values of the resulting belief mass for scenario 4 and the 
PCRS rule for 2 BBAs. 


The PCRS rule for 3 BBAs 


The simulation results of the identification information 
fusion using the PCRS rule for 3 BBAs for the deterministic 
scenarios 1—6 are presented in Figures 67—72. 
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Figure 65. The values of the resulting belief mass for scenario 5 and the 
PCRS rule for 2 BBAs. 
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Figure 66. The values of the resulting belief mass for scenario 6 and the 
PCRS rule for 2 BBAs. 
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Figure 67. The values of the resulting belief mass for scenario | and the 
PCRS rule for 3 BBAs. 


For the PCRS rule for 3 BBAs, the application of the 
decision thresholds at the belief mass level m, = 0.42 for 
scenario 4 allows us to properly evaluate the identification 
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Figure 68. The values of the resulting belief mass for scenario 2 and the 
PCRS rule for 3 BBAs. 
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Figure 69. The values of the resulting belief mass for scenario 3 and the 
PCRS rule for 3 BBAs. 
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Figure 70. The values of the resulting belief mass for scenario 4 and the 
PCRS rule for 3 BBAs. 


of the recognized object. For the PCRS rule for 3 BBAs, 
the application of the decision thresholds at the belief mass 
level mg = 0.37 for scenarios 5 and 6 allows us to properly 
evaluate the identification of the recognized object for most 
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Figure 71. The values of the resulting belief mass for scenario 5 and the 
PCRS rule for 3 BBAs. 
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Figure 72. The values of the resulting belief mass for scenario 6 and the 
PCRS rule for 3 BBAs. 


time moments. 


The PCR6 rule for 3 BBAs 


The simulation results of the identification information 
fusion using the PCR6 rule for 3 BBAs for the deterministic 
scenarios 1-6 are presented in Figures 73-78. 


For the PCR6 rule for 3 BBAs, the application of the 
decision thresholds at the belief mass level m, = 0.45 for 
scenario 4 allows us to properly evaluate the identification of 
the recognized object. For the PCR6 rule for 3 BBAs, the 
application of the decision thresholds at the belief mass level 
Ma = 0.37 for scenario 5 and m,. = 0.4 for scenario 6 allows 
us to properly evaluate the identification of the recognized 
object for most time moments. Comparing Figures 64-66 
with Figures 70-72 and 76-78, one can conclude that the 
PCRS for 3 BBAs and PCR6 for 3 BBAs rules provide more 
stable results of combined belief masses (smaller amplitude of 
changes). Due to the large dispersion of belief mass changes 
for scenarios 5 and 6, it is not possible to correctly evaluate the 
identification of the recognized object for all time moments. 


518 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


0.7 


mass value m(.) 
o 
a 
. 


oO o 

wo f 

a : 
mass value m(.) 


0.27 


0 20 40 60 80 100 120 140 160 180 200 
time step 


0 20 40 60 80 100 120 140 160 180 200 
time step 
Figure 73. The values of the resulting belief mass for scenario | and the 


PCR6 tule for 3 BBAs. Figure 76. The values of the resulting belief mass for scenario 4 and the 


PCR6 rule for 3 BBAs. 


0.9 1 
0.8 
1 

0.7 1 
i 0. - 
| wwWwwy Wi wi ost | 
3 0.7 
g 04 


mass value m(.) 
oOo 
a 


iin 
tt mt ttt itt Ri 


0 20 40 60 80 100 120 140 160 180 200 02 
time step . 


Figure 74. The values of the resulting belief mass for scenario 2 and the 


PCR6 rule for 3 BBAs. 2 40 60 80 100 120 140 160 180 200 
time step 
4 
Figure 77. The values of the resulting belief mass for scenario 5 and the 
al PCR6 rule for 3 BBAs. 
0.8 Pe aT ier uuu 
0.7 
a0) 
gs 1 : + . : : : 
B05, 
8 oat sad ] 
. 08 
0.3F | } 
07} | 
0.6 


0 20 40 60 80 100 120 140 160 180 200 
time step 


mass value m(.) 
i=} 
oa 
A 


Figure 75. The values of the resulting belief mass for scenario 3 and the 
PCR6 rule for 3 BBAs. sin 
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The presented results (Figures 61-78) allow us to conclude hE oe ae = om ie ee 
that the applied methods of removing conflicts in the informa- 


tion fusion allow to draw correct conclusions about the real Figure 78. The values of the resulting belief mass for scenario 6 and the 
identification of the recognized object. PCR6 rule for 3 BBAs. 
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VIII. CONCLUSION 


The proposed basic belief assignment model for ELINT- 
ESM sensors and radars can be used to build identification 
information fusion systems. Practical meaning have, first of all, 
models conformable to STANAG 1241. Due to the assumption 
of conflicts between the ELINT-ESM sensor declarations in 
this work, Dezert-Smarandache theory is used to determine the 
basic belief assignment of declarations being the product of the 
process of fusion of identification information sent by these 
sensors. Supplementing standard reports on detected signals 
with random identification declarations allows the use of 
methods of identification information fusion in the information 
fusion center. The test results confirm the full usefulness 
of conflict redistribution rules in reports from ELINT-ESM 
sensors developed as a part of Dezert-Smarandache theory, 
with the best results presented in the PCRS and PCR 6 rule. 

The basic belief assignment model for ESM sensors and 
for combined primary and secondary (IFF) surveillance radars 
[15] can be applied to build models of different identification 
data fusion systems. The practical significance has first of all 
models compatible with STANAG 1241. It contains definitions 
corresponding to intersections of basic identification declara- 
tions. Therefore, the paper uses Dezert-Smarandache theory 
for calculation the basic belief assignment. 

The conducted research showed that the best results ob- 
tained for the PCR6 rule when reports from three sources 
(from two sensors and the fusion system database) were 
processed simultaneously. This corresponds to synchronous 
processing of reports and involves delayed processing of a 
report from one of the sources. The research confirmed a slight 
advantage of the PCR6 rule over the PCR5 rule. This was 
mainly the case when the sensors sent information with a high 
degree of conflict. 
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Abstract—Infrared image recognition by means of FLIR cam- 
eras (forward-looking infrared) is one of the elements of the 
recognition of the maritime situation and it supports in many 
situations the creation of so-called maritime picture. This work 
presents results of two FLIR image classifiers research. The work 
presents the use of SVM (Support Vector Machine) to classify 
images of maritime objects. The SVM network uses to perform 
the multi-class classification the one-against-all method. Both 
classifiers use the pre-processed FLIR images as input data in 
the form of the brightness of all image pixels and a histogram 
of oriented gradients (for training and testing). All FLIR color 
images have been transformed into grayscale images, segmented 
using the Otsu algorithm with a possible manual correction, 
rescaled, centered and leveled. In the further part of the work 
a method of determining the basic belief assignment is proposed 
for SVM classifiers. In the final part of the work test results 
of the both classification methods and their fusion by Dezert- 
Smarandache PCR5 rule for a set of maritime objects FLIR 
images registered in the Baltic Sea are presented. 


Keywords: FLIR images recognition, image classifiers, SVM 
networks, time series comparison, basic belief assignment, the 
Dezert-Smarandache rule of information fusion, proportional 
conflict redistribution. 


I. INTRODUCTION 


FLIR (forwared looking infra-red) passive infrared sensors 
are used for short- to medium-ranged recognition from 2 
to 20 nautical miles depending on the size of the object 
being recognized and the conditions of observation. They are 
mounted on maritime and air platforms in the armed forces 
and border guards of many countries. FLIR cameras create 
a monochrome image in which the luminance of each pixel 
is proportional to the temperature of the observed point [1]. 
These cameras often artificially color the image to present 
it to the operator. The method of assigning the color to the 
temperature is usually shown on the image. The natural way is 
to assign higher temperatures to yellow colors, and the lowest 
temperatures to blue and purple colors. From the point of view 
of image recognition, these colors are artificial and should be 
removed from the image and converted into shades of gray. 

Recognition of maritime objects based on FLIR images 
should first answer the question whether the registered object 
is a maritime object. If one gets a positive answer, one expects 
an answer to the next question, whether the object being 
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recognized belongs to one of the classes from the training set 
(previously recorded and classified images), or possibly state 
the inability to recognize the type. 

FLIR images can be distorted due to specific atmospheric 
conditions (fog or rain) and solar lighting containing infrared 
radiation as well as due to physical processes taking place 
in the camera. The geometry of the silhouettes of maritime 
objects can be changed as a result of changing camera settings, 
different distance of the object from the camera and different 
object observation angles (so-called aspect angles). The above- 
mentioned factors make the process of recognizing maritime 
objects based on FLIR images a multi-stage process. 

The process of comparing and recognizing objects takes into 
account their specific features called distinctive features. The 
choice of features is related to the specifics of the recognized 
objects, the method of recording images, and the methods 
and recognition algorithms used. In the case of recognition of 
maritime object images, the basic element of the image being 
analyzed is the silhouette of this object. It can be characterized 
by various sets of distinctive features. The principal component 
analysis (PCA) method or methods using deep neural networks 
take into account the luminance of all these silhouette pixels. 

In this work, the descriptor of a grayscale image of a 
maritime object without compression is stored as a horizontal 
vector of original image rows concatenation. It is given the 
name of the linear image descriptor. The second type of 
descriptor is the histogram of oriented HOG gradients [2,3]. 
This descriptor compresses the image by determining the 
brightness gradients of the image in evenly spaced image 
sections (cells). Gradient vectors describe the shape of the 
objects contained in the image. 

The work uses the method of classifying the silhouettes 
of marine objects based on multi-class classification using 
the SVM (Support Vector Machine) network [4,5,6,7,8]. This 
method is based on SVM classifiers, which in the literature 
are also called SVM networks because of their similarity to 
neural networks. These classifiers are, by their very nature, 
two-class classifiers, and thus they allow finding the answer 
to the question whether the recognized object belongs to one 
of two classes. In real problems, the set of patterns is usually 
multi-class, hence methods for using two-class classifiers for 
multi-class classification have been developed. One of such 
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methods called “one against all” is used in the work. It has 
been modified for the fusion of SVM networks in such a way 
that it is possible to determine the basic belief assignment over 
the set of pattern types. 

Another problem to be addressed was the assessment of 
the linear separability of the training set histograms. It is 
related to the possible transformation of the original space 
of distinctive features into a space with a much larger number 
of dimensions and the use of the appropriate kernel function 
of the transformation. In this work, a hint contained in [6] is 
used. It allows the use of a linear kernel when the number of 
patterns in the training set is much smaller than the number 
of distinctive features of the patterns. 

The SVM classifiers require FLIR images to be previously 
processed, preparing a histogram of the vertical brightness 
projection. The purpose of this process is, among other things, 
to eliminate unnecessary information about the background 
of the object and interference, as well as to normalize the 
silhouette of the object. The image pre-processing process 
may include segmentation, brightness normalization, silhouette 
scaling, silhouette centering, silhouette leveling and extraction 
of distinctive features. Some problems of information pre- 
processing have been presented in [4]. 

One of the important objectives of our work was to evaluate 
the effectiveness of information fusion methods [9,10] applied 
to the results of SVM classifiers. In papers [9,10,11], methods 
for determining the basic belief assignment on a set of possible 
decisions of SVM classifiers were proposed. 

In the final part of the work, the results of tests of SVM 
classifiers with two different image descriptors and the results 
of the fusion of these classifiers using the Dezert-Smarandache 
PCRS rule [9,11] are presented. 


II. HISTOGRAM OF ORIENTED GRADIENTS AS A VECTOR 
OF DISTINCTIVE FEATURES OF A MARITIME OBJECT IMAGE 


As mentioned in section I of this work, it was assumed that 
the histogram of oriented gradients HOG is created on the 
basis of the image formed in the process of pre-processing, 
segmentation and secondary processing of the primary image. 

Histograms of oriented gradients (HOG) are image descrip- 
tors that describe the shapes of objects within the image. 
The idea of their operation is based on counting gradients 
occurring in the same spatial orientation (at a specific angle), 
in a certain precisely defined fragment of the image. These 
gradients are counted in evenly distributed portions of the 
image. To improve the efficiency of object detection, local 
contrast normalization is applied in overlapping regions [2,3]. 

HOG descriptors are created by dividing the image into 
fragments, called cells. Gradients are then calculated for each 
fragment. The distribution of these gradients is represented by 
the so-called edge orientation histogram. Cells are grouped 
into blocks to perform contrast normalization. This operation 
is aimed at increasing resistance to shadows, differences in 
lighting [2,3]. The histograms computed in all blocks form 
a vector. This vector is called the HOG. An example of the 
HOG implementation is shown in Figure 1. 


Figure 1. Example implementation of HOG: a) input image, b) gradients for 
the input image with cells 16 x 16, c) gradients for the input image with cells 
8x8. 


II]. LINEAR DESCRIPTOR AS A VECTOR OF DISTINCTIVE 
FEATURES OF A MARITIME OBJECT IMAGE 


In this work, the descriptor of a grayscale image of a 
maritime object without compression is stored as a horizontal 
vector of original image rows concatenation. It is given the 
name of the linear image descriptor. If A denotes an array 
containing a grayscale image of size m x n, then the linear 
discriminant of this image can be represented as (in MATLAB 
notation): 


[A(1,:) A(2,:) ... A(m,:)]. (1) 


IV. CHARACTERISTICS OF SVM CLASSIFIER 
A. Introduction 


Support Vector Machine SVM is a useful technique for data 
classification [5,6,7,8,12,13,14,15,16]. The SVM machine in 
the Polish literature on the subject is most often called the 
SVM network due to its simple interpretation using neural 
networks. First, basic information explaining the principles of 
data classification using SVM in the case of binary (two-class) 
classification will be presented. This technique will then be 
extended to the problem of multi-class classification. 


B. Binary classification 


The SVM classifier belongs to the set of classifiers that 
maximize the separation margin [5,6,7,8] The SVM classifier 
belongs to the set of classifiers that maximize the separation 
margin. These classifiers recognize patterns belonging to 
two classes by specifying a decision surface that provides 
maximum distance to the nearest points in the training set 
called support vectors. 


The distance of any point x from the hyperplane (3) is 


(2) 
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Let us assume that a set of training pairs is given (x;, y;) for 
i =1,...,p, wherein each point x; € R% belongs to one of 
two classes of patterns identified by labels y; = +1 (class 1) 
or y; = —1 (class 2). Assuming a linear separability of classes, 
the equation of the separating hyperplane can be written using 
the formula 
f(x) =w'x+b=0, (3) 
where w = [w1,W2,...,wy]” is an N-dimensional weight 
vector, and x = [11,2%2,...,2y]" is a vector of the distinctive 
features values of the object. The b value specifies the hyper- 
plane offset relative to the origin of the coordinate system. 
Decision equations of classification take the following form: 


wl x; +b> 0, for y; = 1, 
? (4) 


w! x; +bd< 0, for YU; = =1: 


Intuitively, one can say that the greater the distance (2) of 
point x from the hyperplane (3), the greater the reliability of 
the classification. 

Because the assumption of the linear separability of the 
training data has been made, so no training data satisfies 
the equation w’x;+b=0. It follows that the width of 
the separation margin is greater than zero, what means that 
inequalities (4) after their normalization can be written in the 
following form 


w! x; +6 > 1, for y; = +1, (5) 
wx; +b < —1, for yz=—l. 
One can write these two inequalities in one formula 
yi(w' x; +b) > 1. (6) 


If a pair (x;, y;) satisfies in (6) equality, then the vector x; 
is called a support vector SV. 

Assuming a linear separability of training data, these vectors 
only decide on the location of the optimal separation hyper- 
plane and the width of the separation margin, which has a 
value 

2 
IIwl| 


d(x) = (7) 
Optimal separating hyperplane (3) and separating margin in 
SVM in a two-dimensional space are presented in Figure 2. 


The task of optimal separation margin design is to find 
such a margin, which has a maximum width. The problem 
of optimal selection of the separation hyperplane and the 
separation margin width comes down to solving the quadratic 
programming task in the following form 


lip g 
ued = 

with constraints 
yi(w? x; +b) > 1. (9) 


This is a quadratic programming task with constraints 
that can be solved by the Lagrange multipliers method with 
a= [ay a2 ... ay]! multipliers using Karush-Kuhn-Tucker 
conditions [17]. The Lagrange function is as follows 


Class 1 


Support vectors 


Optimal separating 
hyperplane 


Figure 2. Optimal separating hyperplane and separating margin in SVM in 
a two-dimensional space. 


N N 
ui 
L(w,b,a) = swiw - ) aiyi(w x; +b)+ ) a; (10) 
i=1 i=l 


The solution to this optimization task is as follows 


[2,3,4,5,6] 
N 
Ww = » AiYixXi, 
i=1 


wherein non-zero Lagrange multipliers correspond only to 
support vectors. 

To determine the constant value b, one can use the fact that, 
according to Karush-Kuhn-Tucker conditions at the saddle 
point of the Lagrange function, the product of the multiplier 
by the constraint associated with the support vector xs, is 
zero [17] asy(w? Xsy + b+1) = 0, as, > 0. From here one 
can receive 


(11) 


(12) 


b= —w! x4, ae ik, 


The equation of the optimal separation hyperplane is as 
follows 


N 
dl aiyix? x +d = 0, (13) 
i=1 
while the decision function is as follows 
fx) =Diawxixtb21, 4y=1 ay 
f(x) = = ayixix+b<-1, oy=-1. 


More complex models of SVM linear networks include the 
possibility of incompletely linearly separable training data. 
Suitable formulas can be found in [5,6,7,8]. 

In the above considerations, a linear separability of training 
data was assumed. The linear inseparability of training data 
does not mean a lack of their separability at all. A common 
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solution is the non-linear projection of original data into 
another functional space in which transformed patterns are 
linearly separable or the probability of their separability is very 
high. The condition is the use of non-linear transformation 
with a sufficiently high dimension kK >> N of the feature 
space. The above-mentioned construction of the separation 
hyperplane and decision rule can be applied in a new space 
that is specified by the projection function ®. The key property 
of the projection function ® is that the scalar product of 
vectors ®(x;)/ (x) in the result space can be represented as 
a certain kernel function K (x;,x). The basic kernel functions 
are following: 

- linear kernel: K(x;,x) =x 

- polynomial kernel: K(x;,x) = (ax? x+1)4, a> 0, 

- Gaussian kernel (RBF — radial basis function): 

K (x;,x) = exp(—||x; — x||”), 6 > 0, 

- sigmoid kernel: K(x;,x) = tanh(yx?x +r). 

The values of a, 3, y, r and d are the parameters of the 
kernels. 

In [6] there are some guidelines regarding situations in 
which a linear kernel can be used and in which an RBF radial 
kernel can be used. If the number of patterns in the training 
set is much smaller than the number of distinctive features of 
the patterns, a linear kernel can be used. In this work, a linear 
kernel was used, because the number of training patterns was 
340, and the length of the distinctive features vectors was 4800 
or 1368. 

The equation of the separating hyperplane after applying 
the transformation of the primary space of distinctive features 
by means of the kernel function is as follows 


N 
So aiyiK (xi,x) +b =0, (15) 
i= 1 


while the decision function is as follows while the decision 
function is as follows 


f(x) = oan ayyik (x;,x)+b>1, 
f(x) = Dy aiyiK (xi, x) +b < -1, 


(16) 


C. Multiclass classification 


SVM networks divide data into two classes. Unlike classic 
neural networks, where we can have multiple outputs (each 
output is associated with one class), recognition of multiple 
classes requires the implementation of multiple classification 
tasks using multiple SVM networks. The best-known strategies 
for solving the problem of multi-class classification are “one- 
against-one” and “one-against-all” methods [5, 15,18]. Suppose 
the training base has M types of patterns. 

In the case of the “one-against-one” method, M(M — 1)/2 
SVM classifiers are constructed. They distinguish sequentially 
two classes from the training set. One can receive a decision 
function for each pair of classes 7 and 7 


fig (x) = wi 8(x) + by, 49 EM = {1,..., Mh dF fj. 
(17) 


After training all SVM networks, you can proceed to 
classify objects from the test set. If sgn(fi;(x)) indicates the 
i-th class, one should increase by | the counter of this class 
indications, in the opposite case it should be increased by | the 
counter of the j-th class. Finally, we choose the class whose 
counter has reached the highest value. 

In the case of the “one-against-all” method, 1’ SVM 
classifiers are constructed, each network being trained on a 
different training set. Suppose we train the m-th SVM two- 
class network. Class | includes m-th type patterns, while class 
2 includes other types. Finally, we receive a decision function 
for each network 


f(x) = w(x) +bm, meM={1,...,M}. (18) 


After training all SVM networks, one can proceed to classify 
objects from the test set. If sgn(fi;(x)) indicates the m-th 
class, one should increase the number of indications in this 
class by 1, and in the opposite case one should increase the 
number of indications by 1 for all classes in the combined 
class. Finally, we choose the class whose counter achieved 
the highest value of wins. The authors [18] prefer the “one- 
against-all” method because of the linear dependence of the 
number of SVM networks on the number of pattern types in 
the training set. 

The possibility of another extended interpretation of the 
results obtained by the “one-against-all” method is presented 
in [15]. According to [18], the higher the value of the function 
fim(x) (18), the more reliable the classification result is. In the 
case of a linear kernel, such a criterion of reliability may be the 
distance of the recognized object from the separation plane, 
which is equal 


fm(x) =wixtbm, meM={Il,...,M}. (19) 


In point V-C of the work, the value of the f;,(x) decision 
function was used to construct the basic belief assignment on 
the results of the SVM multi-class classification. 


In SVM multi-class classification, each ™-th classifier deter- 
mines the value of its decision function f,,(x). Considering 
the classification results as a whole, one of three situations 
may occur: 

1) Only one f(x) has a positive value, and all the 

others are negative. In this case, the number of positive 
classifier specifies the pattern type number. 
More than one of the f,,(x) are positive. If we assume 
that the higher the value of the decision function, the 
more reliable the classification result is, then the number 
of the classifier corresponding to the highest value of the 
function f,,(x) determines the number of the pattern 
type, what can be written as follows 


2 


Nm 


m* = arg max (20) 


meM=(l,..., rd case 

where id is the number of the recognized type. 
None of the fim(x) values is positive. That should be 
regarded as the new image belongs to a maritime object 
which type is not included in the training set (unknown 


object). 


3 


wm 


524 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


V. FUSION OF INFORMATION FROM TWO SVM 
CLASSIFIERS 


A. The process of fusion of information from two classifiers 


Each classifier used in the work transfers to the fusion 
module a vector of measures of basic belief assignment on 
a set of types of objects in the training set. In this work, it 
was assumed that both classifiers have the same training set. 
The set of possible hypotheses related to individual types of 
objects (the frame of discernment) is as follows 


© = {6;,i=1,...,M}, (21) 


wherein the index 7 numbers the type of the maritime object 
whose images are stored in a training set. M/Z is the number 
of pattern types. 


The hypotheses are exhaustive (21) and mutually exclusive, 
Le. 


(22) 


Each classifier sends its decisions in the form of a BBA 
measure vector (BBA - basic belief assignment). 


m; = [m;(01),...,mi(Oar)], (23) 


wherein the index 7 = 1 determines the BBA measure vector 
calculated by the SVM classifier of linear descriptor vectors 
and the index 7=2 determines the BBA measure vector 
calculated by the SVM classifier of HOG feature vectors. 
The information fusion procedure is described by the fol- 


lowing formula: 
mp = Rr(m),mp), (24) 


wherein mp is a vector of the vector of BBA masses deter- 
mined by the Rp information fusion rule based on the vectors 
m, and mz of BBA masses. 


B. The proportional conflict redistribution rule PCRS for two 
bbas (two sources) 


mp (0;) = MPCRS5 (0;) 


=m12(%)+ SY)  St3"°(6:,0;) 
= el en M 
0;N0;=0 
=mi2(6i)+ S> StS (:,6;), (25) 
j=1,...,.M 
jAFt 
where 
0, for cy = 0 or co = 0, 
PCR5/g. 9.) _ m1(0:)°m2(8;) m2(0;)2m1(8;) 
Siz (8:85) = 9 mai(@)emas) + mar)? (26) 


for c, AO Aco £0, 


where c; = m1(6;) +m2(0;) and co = m2(0;) +m1(0;), and 
wherein 


™m12 (0; ) = m4 (6;)me (6;). (27) 


In the formula (25), the component SPS®5(6;,0;) is equal 


to zero if both denominators are equal to zero. In the formula 
(26), if a denominator is zero, then component is discarded. 


C. The method of determining the BBA for the SVM classifiers 


The procedure of image recognition by means of the SVM 
method “one-against-all” in accordance with the content of 
point IV-C and [15,18] allows to determine the basic belief 
assignment BBA on a set of pattern types. Each k-th pattern 
type is associated with one SVM and the identification process 
determines the value of the decision function 

Nx 
fe(x) = D0 af yf K (xf, x) + be. (28) 
i=1 

The value of this function can be used to determine the 
value of the degree of belief that the recognized (tested) object 
belongs to the class with the number k (k =1,...,M). In 
[8] it was proposed to use the logistic regression function in 
accordance with the following formula 


efx (x) 
Tene 
In the formula (29) & numbers SVMs. . As one can 


see 0 < m(x,k) < 1. The above measure is not normalized, 
therefore we normalize it by 


(29) 


m(x, k) 


f(x, k) = (30) 

One should note that the above method of mass determina- 
tion is simplified, because it does not take into account the lack 
of the type of image pattern corresponding to the recognized 
(tested) image. 


Each SVM network in the “one-against-all” method calcu- 
lates the value of the decision function f;,(x) used to calculate 
the k-th component of the BBA vector using (29) and (30). 
The construction of the “one-against-all” method justifies the 
form of the frame of discernment (21). In this method, there 
are as many SVM networks as there are training image classes 
in the training set. The way of calculating the BBA vector (29) 
and (30) ensures that for each test image, its specific type will 
be determined. It follows from these considerations that the 
basic belief assignment is a Bayesian assignment regardless of 
how the vector of image features (linear descriptor or HOG) 
is determined. Formulas (21) and (22) are supplemented by 
the following formula: 


M M 


VI. RESULTS OF MARITIME OBJECT RECOGNITION USING 
SVM CLASSIFIER WITH LINEAR IMAGE DESCRIPTORS AND 
HOG IMAGE DESCRIPTORS AND THE FUSION OF THESE 
CLASSIFIERS 


(31) 


The study had a limited number of images of maritime 
objects of nine classes. The size of the training set is presented 
in the Table I. 
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Table I 
THE SIZE OF THE TRAINING SET. 


Maritime object class | Size of the training set 
56 


Two training sets were created for each class. In one of 
the sets, the images were represented by linear descriptor 
vectors, while in the other, the images were represented by 
HOG feature vectors. 

The next step was to train two types of SVM models. Thus, 
a total of 18 models were created (two models for each of the 
nine classes). The linear activation function of the kernel was 
used in this work according to the guidelines given in [6]. 


A. Results of maritime objects recognition for SVM classifier 
with linear descriptor vectors 


Due to the small number of available original images from 
FLIR cameras, it was decided to extend the test sets with 
images on which Gaussian noise and salt and pepper noise 
were applied. Finally, the recognition of maritime objects was 
carried out on the basis of test sets, which included 130 images 
of objects belonging to each class. Nine training classes were 
used, so a total of 1170 images were tested. As part of the 
work, the achieved accuracy of classification was examined, 
and the results were presented in the form of tables in the 
following subsections. 

The results of recognition of maritime object images using 
the SVM classifier with linear descriptor vectors are presented 
in the Table II. The right column of Table II corresponds to the 
mean value of correctly recognized maritime object (CRMO). 


Table II 
THE RESULTS OF RECOGNITION OF MARITIME OBJECT IMAGES USING THE 
SVM CLASSIFIER WITH LINEAR DESCRIPTOR VECTORS. 


Object # of # of correct | # of incorrect | Mean value 
type tested images | recognitions recognitions of CRMO 
1 130 0 


0 
0 
0 
0 
6 
0 
0 
6 


Only 6 out of 1170 images were not recognized correctly. 
The accuracy was 100% for all classes except class 6, 
despite the image noise. The average value of correctly 


recognized objects in all classes was 99.48%. Such high 
accuracy may be due to the ability of the SVM network to 
generalize knowledge. This network is also characterized by 
low sensitivity to the number of used training data. Therefore, 
even achieving high accuracy is possible even with a small 
database [5]. In addition, the correctness of the classification 
can be positively influenced by the preprocessing of images, 
as well as a well-chosen kernel function. In this case, a linear 
function was used, which works well with input data of 
sufficiently many dimensions [8]. 


The Table III contains the measured execution times of 
the classification task by SVM models using linear image 
descriptors. 


Table III 
RECOGNITION TIMES FROM THE TEST SET BY THE SVM CLASSIFIER 
USING LINEAR IMAGE DESCRIPTORS. 


Maritime object class | Classification time of images 
el Been 
0.010306 
0.011566 
0.010264 


0.011591 


0.011203 
0.010379 
0.009611 
0.011033 
0.111119 


OMDANADMNSWN 


B. Results of maritime objects recognition for the SVM clas- 
sifier using the histogram of oriented gradients 


The results obtained by testing an SVM network using a 
histogram of oriented gradients (HOG) are presented in the 
Table IV. 


Table IV 
RESULTS OF MARITIME OBJECT IMAGE RECOGNITION WITH THE SVM 
CLASSIFIER USING THE HISTOGRAM OF ORIENTED GRADIENTS. 


Object # of # of correct | # of incorrect | Mean value 
type tested images | recognitions recognitions of CRMO 
1 130 0 


0 
0 
0 
0 
il 
0 
0 
2 


The use of HOG allowed to achieve an even greater number 
of correctly recognized maritime objects. The average value of 
correctly recognized objects belonging to all classes was thus 
99.74%. Therefore, it can be concluded that the use of SVM 
network learning data in the form of histograms of gradients 
allows to increase the accuracy of classification, compared to 
the use of full image information (brightness of all pixels). 
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The Table V contains the measured execution times of the 
classification task by SVM models using the histogram of 
oriented gradients. 


Table V 
RECOGNITION TIMES FROM THE TEST SET BY THE SVM CLASSIFIER 
USING THE HISTOGRAM OF ORIENTED GRADIENTS. 


Maritime object class | Classification time of images 
pooner [Pte te este c's 
0.002925 
0.002263 
0.002203 
0.002268 
0.002443 
0.002189 
0.002088 
0.002155 
0.002172 


1 
2 
3 
4 
5 
6 
7 
8 
9 


The time required to recognize objects from the test set 
by SVM classifiers whose learning data was in the form of 
HOG was an order of magnitude less than the time required 
by classifiers operating on the brightness of all image pixels. 
This is due to the reduced dimensionality of the problem, as 
the input vectors were characterized by a length of 4800 bytes 
for the whole image, while in the case of HOG their length 
was only 1368 bytes. 


C. Information fusion results from SVM classifiers obtained 
using the PCRS rule 


The Table VI shows the classification results obtained by 
using a fusion of SVM classifiers based on training sets 
containing the brightness of all image pixels histograms of 
oriented gradients. 


Table VI 
INFORMATION FUSION RESULTS FROM BOTH TYPES OF SVM CLASSIFIERS 
OBTAINED USING THE PCRS RULE. 


Object # of # of correct | # of incorrect | Mean ci 
spe reser amages recognitions recognitions of CRM 


; 
0 
0 
0 
1 
0 
0 
: 


170 Tes [___2 __ | 99.83% 


The use of information fusion from both types of SVM 
classifiers allowed to obtain the largest number of correctly 
recognized objects. Only 2 objects out of 1170 were not 
assigned to the appropriate class. In this case, the average ac- 
curacy of classification of images belonging to all classes was 
99.83%. This is the largest value compared to the previously 
described recognition methods. 

It can therefore be concluded that fusion of information 
from different sources is reasonable and improves the quality 
of classification. 


VII. CONCLUSIONS 


The Table VII compares the effectiveness of maritime object 
recognition based on two SVM classifiers with linear image 
descriptor and HOG descriptor, and the classifier with infor- 
mation fusion according to the PCR5 rule. This comparison 
allows to positively assess the purposefulness of using the 
classifier fusion. 


Table VII 
INFORMATION FUSION RESULTS FROM BOTH TYPES OF SVM CLASSIFIERS 
OBTAINED USING THE PCRS RULE. 


Object # of Effectiv. 
type tested images 


of SVM 
(linear descript.) 


Effectiveness 
of PCRS rule 
for SVM fusion 


Effectiv. 
of SVM 
(HOG) 


1 
2 
3 
4 
a 
6 
7 
. 


1170 99.48% 99.74% 99.83% 


The results obtained are consistent with the information 
contained in publications on SVM networks. They allow to 
achieve very high efficiency, even when images are noisy. In 
addition, the support vector technique is not time-consuming, 
despite the use of data of significant volume, it shows low 
sensitivity to smaller training sets [5,13]. Both tested classi- 
fiers showed high efficiency in recognizing marine objects, 
however, the use of the histogram of oriented gradients allows 
for slightly higher accuracy than the use of training data in the 
form of a vector containing the brightness of all image pixels. 
The best results, however, were obtained using a fusion of both 
of these classifiers. The fusion was performed using one of 
the most accurate rules for proportional conflict redistribution 
- PCR5. The use of fusion helped to improve efficiency, so 
it can be considered reasonable to combine information from 
different classifiers when conflicts arise. 
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Abstract—In recent years, wearable sensor-based human ac- 
tivity recognition (HAR) is becoming more and more attractive, 
especially in health monitoring and sports management. However, 
in order to obtain high-quality HAR, it is often necessary to 
get sufficient labeled activity data, which is very difficult, time- 
consuming and costly in a natural environment. To tackle this 
problem, multi-source domain adaptation is a promising method 
that aims to learn enough multi-source prior knowledge from 
labeled activity data, and then transfer this learned knowledge 
to the target unlabeled dataset. Thus, this paper presents a 
novel multi-source weighted domain adaptation with evidential 
reasoning (w-MSDAER) for HAR, which can effectively utilize 
complementary knowledge between multiple sources. Specifically, 
we first use the strategy of distribution alignment to learn local 
domain-invariant classifiers based on multi-source domains. And 
then, the reliabilities of these derived classifiers are compre- 
hensively evaluated according to the belief function based the 
technique for order preference by similarity to ideal solution 
(BF-TOPSIS). Finally, the discounting fusion method is used to 
fuse the local classification results. Comprehensive experiments 
are conducted on two open-source datasets, and the results show 
that the proposed w-MSDAER significantly outperforms other 
state-of-art methods. 


Keywords: human activity recognition, multi-source domain 
adaptation, evidential reasoning, reliability assessment. 


I. INTRODUCTION 
A. Background and Research Motivation 


In recent years, wearable sensor-based human activity 
recognition (HAR) usually uses the raw signals collected by 
wearable sensors to identify human activities, and to help the 
patients to deal with chronic injuries or provide personalized 
medical advice. Given its good application prospect, sensor- 
based HAR has been comprehensively discussed in recent 
surveys [1], [2] and has also been widely used in many 
real-life scenarios such as health-care, ubiquitous computing, 
and human-computer interaction [3]-[8]. Generally speaking, 
high-precision activity recognition relies on good generalized 
recognition models, which means that sufficient and labeled 
data is always acquired to train reliable models in advance. 


However, one typical and common scenario is that labeled 
data collected from specific positions or individuals are often 
limited. Still, we hope that the recognition models learned 
from such labeled data can identify the unknown activities 
of many other positions or people [9]. For example, the pre- 
trained HAR model built in the smartwatches needs to identify 
each user’s activities. However, it is impossible to label all 
consumers’ personal data in the manufacturing process of 
smart watches. Many recent cross-domain references [10]- 
[12] pointed out that the recognition model learned on some 
positions or specific people can not be well generalized 
to other positions and people. To directly demonstrate the 
negative effects of domain differences, the cross-location and 
cross-person activity recognition experiments were conducted 
on the mHealth! dataset based on previous works [3], [13]: 1) 
for the same subject, the Long-Short Term Memory (LSTM) 
was first trained by data collected from the accelerometer 
installed on the chest, and then this well-trained LSTM model 
was applied to identify the unknown activities based on the 
data collected by those sensors located on the chest, left- 
ankle and right-lower-arm, respectively; 2) the LSTM model 
trained for Subject! (Stl) was used to recognize the unknown 
activities of Stl, Subject2 (St2), Subject3 (St3) and Subject4 
(St4), respectively. As shown in Fig.1, the average accura- 
cies of self-activity recognitions are 88.65% for Chest — 
Chest and 83.61% for St1 — Stl, while the accuracies of 
cross-position and cross-person recognition were only 14.84% 
(Chest—Left Ankle), 5.64% (Chest—Right Lower Arm) and 
32.84% (Stl-St2), 34.34% (Stl—St3), 35.6% (Stl—St4). 
This result fully shows that the domain difference affects the 
classifiers’ performances. 

To reduce the negative impacts of domain differences, many 
recent works have successfully applied domain adaptation 
(DA) approaches to deal with cross-domain recognition prob- 
lems. The classical single-domain adaptations often rely on the 


"http://archive.ics.uci.edu/ml/datasets/mhealth+ dataset. 
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Figure 1. Classification Accuracy in Cross-Domain Activity Recognition. 


domain-invariant feature extractions [11], or domain mapping 
strategies [12]. For example, the Joint Distribution Adaptation 
(JDA) method was proposed by Long et al. [14] to adapt 
both the marginal and conditional distributions simultaneously. 
Since deep neural networks have the capability of powerful 
feature extraction, many deep feature transformation methods 
have also been presented to adapt the distribution. Ghifary et 
al. [15] proposed a domain adaptive neural network (DaNN) 
to reduce the distribution mismatch between the source and 
target domains. However, those aforementioned methods only 
consider the situation of single source domain. In practical 
applications, there may be multiple source domains, and how 
to effectively utilize the complementary knowledge between 
different source domains has been a serious concern issue [16], 
[17]. Currently, many sophisticated techniques have been pro- 
posed to take advantage of differences among source domains 
or the relationship between sources and targets. Zhao et al. [18] 
proposed a separate domain classifier for each source domain 
and computed a loss based on the lowest domain error among 
these classifiers. Liu et al. [19] presented a novel evidential 
framework for combining multi-source domain adaptations. 
In [19], the reliabilities of source domains were directly 
evaluated and the multiple classification results derived from 
multi-source were fused with a novel decision-level cautious 
combination rule. Moreover, Liu et al. [20] also proposed a 
new method called distribution adaptation based on evidence 
theory to improve the classification accuracy by combining the 
complementary information derived from both the source and 
target domains. 


As we discussed earlier in Fig.1, traditional sensor-based 
action recognition models have difficulties in solving cross- 
domain HAR problems. To tackle these problems, Hong et 
al. [21] proposed a novel single-source DA method based 
on semi-population strategy; Wang et al. [22] presented a 
novel deep network coupled with transfer learning for cross- 
position HAR (TNNAR); An extreme learning machine based 
kernel fusion was proposed by Wang et al. [23] to deal with 


domain alignment in HAR. Besides, Garrett Wilson et al. 
[24] presented a convolutional deep domain adaptation model 
for time series data (CoDATS) from multiple source domains 
to improve the accuracy over prior single-source methods. 
However, the mentioned TNNAR or CoDATS did not fully 
consider the reliabilities/weights of source domains in their 
multi-source domain adaptations, which in some degree affects 
the final classification results. Inspired by the weighted com- 
bination strategy for multi-source domain adaptation proposed 
in [19], we propose a new multi-source weighted domain 
adaptation with evidential reasoning (w-MSDAER) for the 
human activity recognition problem. The difference in weight 
calculation between w-MSDAER and Liu’s method in [19] 
is mainly reflected in the stage of reliability evaluation for 
source domains: Liu’s method in [19] can be regarded as the 
pre-weight calculation strategy before basic belief assignments 
(BBAs) generation. In [19], each reliability of the source 
domain is directly evaluated by using the domain distance 
before and after distribution matching steps; In our proposed 
w-MSDAER, the reliabilities of source domains are indirectly 
evaluated based on the multi-criteria strategy after BBA gen- 
eration. These BBAs are obtained from the outputs of local 
domain-invariant classifiers trained by the source domains 
and comprehensively evaluated from two aspects: the distance 
degrees between BBAs and the imprecision degree inside each 
BBA. Thus, w-MSDAER can be regarded as the post indirect 
weight evaluation method compared to Liu’s method in [19]. 


B. Challenges 


It is essential to fuse the complementary multi-domain 
knowledge effectively. However, two main issues need to be 
addressed: 


e How to comprehensively evaluate the reliability of dif- 
ferent source domains? Since the classifiers using data 
from multi-source domains may have distinct abilities to 
classify activities in the unknown domain, it is necessary 
to analyze the reliabilities of multi-source domains in the 
process of domain adaptation; 

e How to fuse the outputs of local domain-invariant classi- 
fiers learned from different source domains? Considering 
that conflicts may exist between the classification results 
from different source domains, our fusion rules are re- 
quired to effectively solve the highly conflicting fusion 
problem. 


C. Main Contributions 


To solve these two aforementioned problems, we first use 
the manifold embedded distribution alignment (MEDA) to 
learn local domain-invariant classifiers based on source do- 
mains. And then, the outputs of these classifiers are trans- 
formed into the BBAs and evaluated by the multi-criteria 
evaluation strategy: BF-TOPSIS?. 

The main contributions of this work are summarized as 
follows: 


?BF-TOPSIS is an extension of the technique for order preference by 
similarity to ideal solution (TOPSIS) [25] based on belief functions (BF). 
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e The novel fusion framework of weighted multi-source 
domain adaptation based on evidential reasoning (w- 
MSDAER) is presented. Here, w-MSDAER can combine 
the complementary knowledge among different source 
domains; 

e In w-MSDAER, the reliabilities of local domain-invariant 
classifiers learned from multi-source domains are com- 
prehensively assessed according to the BF-TOPSIS strat- 
egy. In BF-TOPSIS, two well-selected criteria are chosen: 
distances between BBAs and the self-imprecision degree 
inside the BBA; 

e Through comprehensive experiments on two public activ- 
ity recognition data sets, the superiority of w-MSDAER 
is shown. 


The rest of this paper is organized as follows. In section 
II, the basics of evidential reasoning is presented. Section II 
describes the proposed w-MSDAER. The experimental results 
are given in details in Section IV. Finally, the article concludes 
and gives several future research directions in the last section. 


II. BASICS OF EVIDENTIAL REASONING 


This section briefly introduces the basic knowledge of 
evidence reasoning, also known as Dempster-Shafer Theory 
(DST), which is necessary for presenting the proposed w- 
MSDAER. 

In DST, the concept of Frame of Discernment (FoD) rep- 
resents a set of exhaustive and exclusive elements which is 
denoted as © = {64,... ,On} (n > 2). The power set of 0, 
which is the set of all subsets of © (including the empty set 0, 
and @ itself), is denoted 2° because its cardinality is exactly 
equal to 2!°!. A BBA m/(-) is defined by the mapping: 2° +> 
(0, 1], verifying m(0) = 0 and Yop.90 m(0) = 1. Bel(@) = 
dore29 arco MG") and PIG) = d9'e2° a'roz9 M(6’) define 
the belief and plausibility function, respectively. The interval 
BI(0) = [Bel(0), PI(0)] is called the belief interval of 0, 
which is usually interpreted as the interval where the value of 
unknown probability of 6 must belong. 

In order to combine two distinct sources of evidence, the 
classical Dempster Shafer rule (DS) in [26] was proposed and 
defined by mps(@) = 0 and V6 € 2° \ {0 } 


i (0) 2 »0',0'€2°|0'00"=0 m1 (6’)m2(6") 
D = 
: i 2260/,07€28 0100” =0 my (8’)m2(0") 


To palliate DS rule drawbacks (see discussions in [27]), in 
Dezert-Smarandache Theory (DSmT) [28], the very interesting 
Proportional Conflict Redistribution-5 (PCR5) was defined by: 
mpcrs(@) = 0 and Vé € 2° \ {0 } 


(1) 


mpcors(9) = m12(8)+ 
m4 (@)?m2(0') 


m2(0)?m1(0’) 
/ 
o7€2%\ (oyjoner—o 7109) + mal!) 


2 (2) 
m(8) +m1(6')” 


where ™m}(.),72(.) are two independent BBAs and mj2(8) = 
0,0 E29 9A" =6 M1 (0")m2(0"). 


III. MULTI-SOURCE WEIGHTED DOMAIN ADAPTATION 
WITH EVIDENTIAL REASONING FOR HAR 


In this part, we first raise the issue of cross-domain HAR 
and then briefly present the specific steps of local domain- 
invariant classifier learned from MEDA and the multi-criteria 
assessment of these derived classifiers’ reliabilities by using 
BF-TOPSIS [25]. At last, we present how to fuse all outputs 
of local classifiers based on the discounting combination rule 
and make the final decision. 


A. Problem Definition 


Assuming that there exists M labeled source domains: 
Ds,;Ds.,°** ,Ps,, and an unlabeled target domain D; in 
the multi-source DA problem. The source domains D; = 
{(xPs, yPs)}Ns, contains Ns labeled samples and x?s 
follows the domain distribution pro?:(x,y) and y?s ¢€ 
{1,2,---,n}(n represents the number of categories) is its 
related label for samples in the source domain. Similarly, D, = 
{xDe\Nt is the target domain, which includes Nt unlabeled 
samples, and x? follows the target distribution pro” (x, y). 
In our problems discussed in this paper, D, and D; share 
the consistent feature spaces and the same label spaces: 
XPs = XP and Ys = Yt. However, D, and D; belong 
to different distributions: pro”: (x,y) 4 pro?*(x,y). With 
the help of the multi-source domains D,,,Ds5,,--- ,Ds,,, the 
goal of our task is to obtain the label y?* for the unlabeled 
samples in target domain. Here, we use a simple cross-position 
activity recognition problem to give more clear and specific 
descriptions of relevant terms discussed above. Assuming that 
there exists two source domains: D,, = {(x?:1,y?s1)} and 
Ds, = {(x?:2,y?s2)}, where D,, and D,, represent the 
source domains derived from accelerators located on the chest 
and left arm, respectively. Besides, x?*1 and x*2 are the raw 
signals collected by accelerators and the mathematical symbols 
y?*1 and y?s2 represent the categories of activities, such as 
standing, walking, running. Our main task is to classify those 
unlabeled samples x”* in the target domain D, = {x?*}, 
which are collected from the accelerator located on the right 
ankle. 


B. Local domain-invariant classifier based on manifold em- 
bedded distribution alignment 


As mentioned in [19], our goal is to use the classifier 
f :xPt 1 y?® learned from multi-source domains to realize 
the classification of those unlabeled samples in the target 
domain. Here, we use the MEDA strategy proposed by Wang 
et al. [29] to learn the local domain-invariant classifier before 
combination. More detailed discussions about MEDA are 
given in [29]. 

Specifically, according to the structural risk minimization 
(SRM) [30], the domain-invariant learned classifier f(-) can 
be represented as 
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f =argmin 24 n f 
rgini Stn Illi 
+AD} (Ds, Di) + pRz (Ds, Di) ; (3) 


where Z, =g(Xu) is the transformed manifold features; 
Considering the characteristics of computational efficiency, 
Geodesic Flow Kernel (GFK) [31] is applied to learn g(-); 
The squared norm of f is denoted as hale The dynamic 
distribution alignment is represented by Dy; Besides, Ry is 
the Laplacian regularization [29]; Based on kernel function 
K(.,-), the Hilbert space Hx can be derived; 7, A,p are 
regularization parameters accordingly. 


To obtain the local domain-invariant classifier f, the details 
of each term in (3) are reformulated as follows: 


1) Local classifier learned by SRM: According to the 
representer theorem [32], f can be expressed by 


Ns+Nt 


=a Buk (2u,%), (4) 

where the coefficient vector is denoted by 8B = 
T : 

(81, B2,+°- Burst? , BWs+Nt)») € Ret Nok K(-,:) 1S 


a kernel function. Afterwards, we can use the SRM strategy 
for Ds: 


zu))? + nlf llz = pm * Aualy u))? + nllfllz 


| (x — 87K) All? + nt (67K), 


u— f(z 


(5) 
where K € R(Ns+Nt)x(Ns+N*) represents the kernel matrix 
with K,,,, = K(Zu,Z ); The Frobenious norm and trace 
operators are denoted by ||-||,,, tr(.), respectively; A € 
RW s+Nt)x(Nst+Nt) ig a binary diagonal domain indicator 
matrix with A,, = 1 if u € Dz, Au, = O otherwise. 
Y = [y1,---,Yns+nt] is the label matrix from source and 
the target domains. 


2) Distribution alignment: Here, the distribution alignment 
is defined by: 


Dy (Ds, Pt) = tr (@B'KMKB) , (6) 


where M = (1 — p1)Mo + 1 9;_, Mr represents the MMD 
matrix, and the elements inside can be computed [29] by 
Ne Zu, Zy € Ds; 
(Mo) a» = TEE ZusZy © De: (7) 
= aoNT , otherwise. 
ets (h). 
Net »Zu,Zy € Ds 
ms taste € DY”; 
(Mn) uv = 1 Zu € Dp”, Zy € De; (8) 
*% Ntp-Nsp’ 


Zu € D”) a, € pi"). 
0, otherwise. 


where Ns), = pi '| and Nt, = ,2,°-: ,n} is 


the number of categories. 


3) Laplacian regularization: The regularization can be ex- 
pressed [29] by 


Ns+Nt 
Ry (Ds, Dt) = S- Www (f (Zu) — f (zy)? 
u,v=1 
Ns+N 
= 30 f ( Zn) Luv f (Z,) 
u,v=1 
= tr (@°KLKB). (9) 


where 


W.. = { sim (Zu, Zv) ,Zu € Np (Zu) or Zy € Np (Zv) 


0, otherwise 


To measure the similarity between two points, we here use 
sim(-,-) (for example, cosine distance); The set of p—nearest 
neighbors to z,, is denoted by N,(z,,); Laplacian matrix L = 
D — W with the diagonal matrix Duy, = ye Ny Ww. 

4) Overall Reformulation: Using the formulas (5), (6) and 
(9), f in (3) can be expressed as 


f= a min || (Y — 67K) All; + ntr (B™KB) 
+tr (a°K (AM + pL) os 


(10) 


By setting the derivative $f 
solution: 


= 0, we get the corresponding 


B* =((A+AM + pL)K+nI) ‘AY?. (Ad) 


C. Reliability assessment of BBAs generated from local 
domain-invariant classifiers based on BF-TOPSIS 


As discussed earlier, MV local domain-invariant classifiers 


learned from source domains (Ds,,Ds,,--- ,Ds,,) provide M 
BBAs m; (i = 1,--- ,M) (12): 
01 02 63. ...8 glo 
Ds, /mi(01)  mi(02) m1 (03) m1(85\0)) 
Ds, | m2(01) me2(02) me2(83) m2(9\0)) 
. . , (2) 
D>, ma (61) ma (62) ma (03) mar (85161) 


where 6 € 2°. Considering the possible conflicts and infor- 

mation redundancy among the evidence sources provided by 
multiple source domains, we need to evaluate the reliability of 
the evidence sources before processing them through a fusion 
step. 


1) Evaluation criteria: Two widely used criteria are applied 
in our work which are described as follows: 
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a) Distance degree: The average distance between Ds, 
and other involved source domains D,, are defined as follows: 


M-1 
dayer (Daz) 


(-),mo,, ¢- V5 (13) 


k=1 


where the interval distance dg; was proposed by Han [33], 
and 


Np ey [d! (Bhi (Gi), Blo (Gis) 


e dgr(m4,m2) = an 


e Nf is the normalization factor: Nf = ij), 


e BI, (0;) i (Bel, (6i:), Pli ix) ], 
e BI(0;i) : [Belo (0::), Plo(Gis)], 
» dl ([a,0], [e,€]) = [938 — $2? + 3152 — 5°? 


For the interval distance degree (daver(Ds,)), it mainly 
measures the difference between the specific domain D,, itself 
and other involved domains D,,,,i’ € {1,---,M},i F i. 
If this distance value is large, it means that this domain 
is quite different from other domains, so the reliability of 
this specific source domain is low and vice versa; Because 
we consider “the lower is better” preference ordering for 
davyer(Ds,), we multiply its values by -1 and we take as first 
criterion C'r; = —dayer(Ps,) to apply BF-TOPSIS formulas of 
[25] that are established for “the greater is better” preference 
ordering. 


b) Imprecision degree: Within evidential reasoning, the 
strife [34] is often used to determine the imperfection degree 
within a BBA. The measure of strife is defined as: 


St(m)=- >> 


0E€F(m) 


m(0)logal_ S~ a: (14) 


0’ €F(m) 


where |9 7 4'| and |6| refers to the cardinality of the subset 
66 and 0. 

As the formula (14) defines, St(m) does not describe the 
relationship between domains. It mainly measures the impre- 
cision degree within the BBA and the value is determined 
by the belief masses of the focal elements in the BBA: 
if the output of local domain (D;,) classifier is denoted 
as m,(61) = my (02) = = my (45)61 ) = 1/(2/°1), 
the value of St(mz) is largest which means that this BBA 
cannot give help to make the final decision; On the contrary, 
my, (61) = 1,m (62) = my (63) Se m1 (4\0) ) = 0, the 
value of St(m,) is smallest because we can make the final 
decision (6,) easily according to the principle of maximum 
probability. Because we also consider “the lower is better” 
preference ordering for the strife measure, we multiply its 
values by -1 and consider as 2nd criterion Cr2 = —St(m) 
in order to apply BF-TOPSIS formulas [25]. 


2) Evaluation of source domains’ reliabilities: 


a) Scoring matrix: We first compute the reliabilities of 
multi-source domains according to each criterion Cr;,j = 


1,--- , Nc (in this bee Nc = 2) and then scoring matrix S 
s 


can be generated as follows: 
Ds, Ds» Dss Ds 
Cry Si(Ds,) Si (Ds) Si (Ds) S1(Dsyr) 
Cro S2(Ds,)  S2(Ds.)  S2(Ds3) S2(Dsy,) 
; : : » (15) 
Cry. \Sn.(Ds1) SN.(Ds2) Sn.(Ps3) Sn.(Dsu) 


b) Construction of local BBA for source domain Dg, : 


m;(Ds,) = Bel;(Ds,); (16) 
m;(Ds,) = Bel;(Ds,) = 1— Pl;(Ds,); (17) 
m;(Ds, UDs,) = Pl;(Ds,) — Bel;(Ds,). (18) 


Here, if we treat all involved source domains D,,, Ds,, 
.., Ds,, in (15) as the abstract focal elements in a special 


FoD: OP = {D,,,Ds,,°*+,Dsy}. Ds; is defined as the 
complement of D,, in 0”. Besides, Bel;(Ds,), Pl;(Ds,) and 
Bel,(Ds,) in (18) are defined as follows [25]: 
Su 5 Ds; 

Bel;(D,,) © BPad) 

Dre tae M9155 Bay<5j(Bo,) |81(Po~Si(Day) 1) 

_ max; Sup;(Ds; ) ’ 

Bel;(D.,) & TEP 

5 pe fie) M}18j (Day )28; (Pay) |Sj(Ds;,)—S3(Ds,,)| (20) 

_ mini Inf; (Ds; , 


where the Bel;(Ds,) is the belief of D,, as the evidential 
support of hypothesis: “D,, is better than its competitors Ds,” 
and the definition of Bel;(Ds,) is similar to Bel;(Ds, ). 


a 


c) BF-TOPSIS method [25]: 
e Step 1: From the score matrix S, compute BBAs 
mij(Ds,), Mij(Ds,) and mi;(Ds, UDs,) using (18); 
Step 2; Calculate dpr(miz,m7s*") 
dpi(mi; myers) where —_mbest(D,,) £1 
mee Ds.) £1 represent the best and worst ideal 
BBAs, respectively; 


and 


and 


e Step 3: Calculate dgr(mi;, mest) and 
dz (mij, myerst): 
Petey 2 yu dar( (miz,m meest); (21) 
Ger eD £yw dar( (mij,m mine), (22) 
e Step 4: Calculate the closeness degree: 
Closeness(Ds,,D2°*") & ee aS (23) 


Step 5: Compute the weights of each source domain based 
on Closeness(Dz,,D°°*') € [0, 1] using (24): 
Closeness(Ds,,D°°**) 


CD 
(Ds,) es Closeness(D,,, D°°*") 


(24) 
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D. Final combination with discounting and PCRS rule 


Once the weights of all local-invariant classifiers learned 
from source domains are derived, we just discount the final 
classification results which have been represented by BBAs. 
And the specific discounting rule is presented as follows for 
a=1,...,M: 


™mi(0) = wi-mp,, (0) ,0€ of 
m(Q) = 1—w; +; +m; (Q) 


OdFQ (25) 


One can see that the mass assigned to each focal element is 
proportionally transferred to O by the given weighting factor 
w,;. Thus, the small weighting factor will cause a big increase 
of mass of belief committed to ignorance. These / discounted 
classification results can be combined sequentially using the 
PCRS fusion rule as (2) by 


Mgfusion = ((my & M2) & m3) “++ @ my. (26) 


where © symbol denotes the fusion rule. 

To reduce the computational complexity of fusion rules, 
we consider the reduced power set (i.e., De vised 
{61, 02,--- ,8n,O}), which only includes singletons and © in 
the following activity recognition problem. Finally, according 
to the combination of multiple classification results provided 
by different source domains, a classification decision is made 
on the unlabeled samples in the target domain. In this paper, 
the final decision of the predicted class can be made as 
6* = argA + maxgm(O) where @ is a singleton of the 2° 
based on the max of belief mass. And the brief framework 
of w-MSDAER is given in Fig.2, and the pseudo-code is 
described in Algorithm 1. 


Source Domains _Local-invariant Classifier Reliability Assessment Target Domain 


Figure 2. Framework of Our Proposed Method for Cross-Domain Activity 
Recognition. 


IV. EXPERIMENTAL EVALUATION 


In this part, our proposed w-MSDAER has been compre- 
hensively evaluated on two open source datasets: the daily 
and physical activity dataset (DSADS)? [35] and the physical 
activity monitoring dataset (PAMAP2)* [36]. Our experiments 
mainly focus on two types: 1) cross-position HAR; 2) cross- 
person HAR. In the following subsections, we first introduce 
the experimental setting in detail. And then, the related recog- 
nition results and the analysis of parameter sensitivity are 


3https://archive.ics.uci.edu/ml/datasets/daily-+and+sports-+activities. 
*https://archive.ics.uci.edu/ml/datasets/pamap2+physical-+activity+monitoring. 


Algorithm 1: w-MSDAER for HAR 


Input: MM labeled source domains: Ds, ,Ds.,--- ,Ds,,; Source 
domain Ds; = {(xPsi ; yu) Ns and target domain 
Dt = {xPt NE 
Output: Prediction of the labels of target domain y?*. 
Initialize: manifold subspace dimension d; regularization parameter 
7, A,e and #neighbor p; Iteration; 
2 Training of Local Domain-Invariant Classifiers: 
3 fori =1,---,M do 
4 Learn the transformed mainfold feature using GFK: 
Zy, = G(X) 
5 Train a basic classifier using D,; and predict on D; to get its 
labels gPt; 
Construct kernel K* using z?,, 2¢,; 
Repeat: 
Compute (Mo),,,, and (Mzp),,,, using (7) and (8); 
Calculate G* using (11) and obtain fp,, via the representer 
theorem in (4); 
Update the soft label of Dy, g?* = fD,; (Zu); 
Until Convergence; 
Output: classifier fp, ,; 
end 
Calculation of Discounting Factors: 
Construct the BBAs matrix (15) and compute the scoring matrix S 
using two criteria (13) and (14); 
Calculate the weights of all involved BBAs using (24); 
Update the BBAs based on the discounting rule (25); 
Fusion Step: Mgusion = ((m1 ® m2) ® mz)---@ my; 
Decision Step: Take as decision the maximum of belief mass of 
singleton focal elements 6* = arg A+ MaxgM fusion (9); 
final ; 
return Prediction of labels of target domain y?*. 


an 


briefly presented. Similar to [9], [19], it is worth noting that 
different domains here refer to different sensor positions or 
different persons in cross-position HAR or cross-person HAR 
problems. 


A. Data Set Description 


In the HAR field, the mentioned two datasets: DSADS and 
PAMAP2 have been widely used. The DSADS dataset mainly 
includes nineteen activities, which were repeated by eight 
subjects. Those raw data collections mostly come from five 
IMUs located on the torso (T), right arm (RA), left arm (LA), 
right leg (RL), and left leg (LL). To facilitate the discussion, 
in this article, we only consider four subjects (Subject 1 to 
4) and ten everyday activities in daily life (sitting, standing, 
lying on back and right side, ascending and descending stairs, 
standing in an elevator still, moving around in an elevator, 
walking in a parking lot and walking on a treadmill with a 
speed of 4 km/h). For the PAMAP2 dataset, due to lack of 
data, we only selected a subset of three people (Subject1-3) 
and four activities (lying, sitting, standing, and walking) for 
our following discussions about cross-domain HAR. And in 
PAMAP2, those involved wearable IMUs were installed on 
three different body positions: arm, chest, and leg. 


B. Experimental Setup 


Similar to [22], [37], we do not use the original time series 
data. Instead, we classify those unlabeled activity data based 
on the artificial features. Specifically, we use z-score [22], [37] 
to standardize the data, and combine the data from the three 
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axes of one sensor by using the simple averaging method given 
by [22], [37]. Then, we segment the data using a 5s-window- 
size and 3s-step-length moving window. Afterwards, 27 time 
and frequency features are extracted for a single sensor, and 
more details about feature representation can be referred to 
[37]. Finally, the values of extracted features are normalized 
into the interval [-1,1]. In this paper, we directly use these 
processed activity data after feature extraction and normaliza- 
tion, which can be downloaded from the link>. In the com- 
parison experiments, we compare our proposed w-MSDAER 
with those baseline methods to illustrate the effectiveness of 
our proposed algorithm. Those discussed baseline methods 
include 1-nearest neighbor (1-NN), support vector machine 
(SVM), random forest (RF), extreme value learning machine 
(ELM); single-source domain adaptation methods, namely 
hierarchical transfer learning (STL), It performs marginal 
distribution within a group, joint distribution adaptation (JDA), 
balanced distribution adaptation (BDA), TNNAR and DaNN; 
multi-source domain adaptation methods, namely CoDATS, 
ELM+DS, ELM+PCR5, BDA+DS, BDA+PCR5, JDA+DS, 
JDA+PCR5, JDA+WDS. Here, these seven mentioned multi- 
source domain adaptations are manually generated according 
to the very simple principle: for each domain, we first use 
the baseline models such as ELM, BDA, JDA to learn the 
basic classifier, and then the outputs of these basic models 
are combined by using the classical DS rule (1), PCRS rule 
(2) and weighted dempster rule (WDS) in the decision-level 
fusion. Besides, the hyperparameters in classical domain adap- 
tation approaches were all determined in the same manner as 
previous references [11], [37]. For our proposed w-MSDAER, 
we set the main feature dimension d = 30, the number 
of iterations is set to 10, and the regularization parameters: 
7 =0.1,A = 10, p = 1.0 and p = 10. 


C. Measure of Performances 


In this paper, we use Accuracy to measure the performance 
of w-MSDAER, which is defined by [38] 


TP, +TNn 


A SN ee . 0T 
eae a » TP,1TN,1FP,+FN, 7? 


where fi, denotes class index and n is the number of classes. 
TP, T Nn, FP, and FN», are respectively True Positives: TP, 
True Negatives: TN, False Positives: FP and False Negatives: 
EN. 


D. Results and discussions 


For cross-position and cross-person HAR tasks, Tables I and 
II show the classification results of the proposed w-MSDAER 
compared to the classical single-source domain adaptation 
methods. In these two Tables, for convenience, we simply 
use “source domain—target domain” to name the specific 
cross-domain HAR tasks. For example, “RA—LA” means that 
the source domain is the data from the right arm and the 


Shttps://github.com/jindongwang/activityrecognition. 


target domain is the data collected from the left arm in cross- 
position HAR; “Stl—St2” means that the source domain is 
from subject1 and the target domain is from subject2 in cross- 
person HAR task. Table I shows that in the cross-position HAR 
experiments, w-MSDAER is superior to other methods in most 
cases. In particular, compared with traditional models (such as 
1-NN, SVM, RE, and ELM), w-MSDAER can achieve higher 
classification accuracy in most cases, which indicates that w- 
MSDAER can guarantee a stable positive transfer in the cross- 
domain HAR task. In comparison with other transfer learning 
methods, especially JDA and TNNAR, the accuracy of w- 
MSDAER is improved by more than 10% and 2% respectively. 
This phenomenon further shows that the multi-source weighted 
domain adaptation method with evidential reasoning is an 
effective strategy to solve the cross-location HAR problem. 
Table II also shows a similar comparative phenomenon on 
the cross-person HAR. Another interesting observation is that 
the performances of most methods on cross-position HAR are 
relatively lower than those on the cross-person HAR. This is 
mainly because less involved sensor data are used in the cross- 
position HAR. However, these classical domain adaptations 
methods only consider one source domain to solve the transfer 
learning tasks. Differently, w-MSDAER performs multi-source 
fused domain adaptation. Compared with the single source 
domain, w-MSDAER reduces the uncertainty at the final 
decision layer and gives more accurate predictions. These 
characteristics lead to the best performance among compared 
single source domain-based DA methods. 

In addition, we have also compared our method with the 
multi-source domain adaptive methods in multi-source cross- 
position and cross-person tasks, and the results are shown in 
Table III and Table IV. For convenience, we also simply use 
“multi-source domains—target domain” to name the specific 
multi-source cross-domain HAR tasks. For example, “T, RA, 
RL-+LA” means that the source domains are the data from 
the right arm, right leg, torso and the target domain is the 
data collected from the left arm in cross-position HAR. “Stl, 
St3—>St2” means that the source domains are collected from 
subjectl, subject3 and the target domain is from subject2 in 
cross-person HAR task. Overall, deep multi-source domain 
adaptation methods (CoDATS and CoDATS+WS) are bet- 
ter than the classical shallow multi-source domain adaptive 
methods. It is mainly because the deep multi-source domain 
adaptive methods can extract the depth features based on the 
depth network to complete the distribution matching of the tar- 
get domain. However, our proposed w-MSDAER outperforms 
the CoDATS and CoDATS-WS. The main reason is that the 
traditional multi-source domain adaptation (both shallow and 
deep ones) does not evaluate the reliabilities of source domains 
from multiple perspectives. This is the main innovation of 
this paper, which has been clearly pointed out as our main 
contribution. 

In order to clearly show the difference between w-MSDAER 
and Liu’s method in [19], we also use Liu’s method [19] to 
obtain the reliabilities of the involved domains for the final 
discounting steps in cross-position and cross-person tasks. In 
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Table I 
ACCURACY (%) COMPARISONS BETWEEN SINGLE-DOMAIN ADAPTATIONS AND W-MSDAER ON CROSS-POSITION HAR. 


Method INN SVM RF ELM STL JDA BDA TNNAR DaNN w-MSDAER 
Dataset Task 
12*DSADS TOoLA 54.5 40.17 37 45.90 42.67 66.17 53.67 54.76 55.86 
RA->LA 76.17 68.33 49.17 74.80 68.33 76 68.33 73.02 70.07 80.67 
RL-=LA 42.33 30.17 37.5 44.82 38 65.5 48.67 48.84 49.82 
RAT 63 66.5 47.83 58.67 67.17 61.17 65 66.25 64.74 
LAT 57.17 50.17 38.83 49.17 51.33 59.83 58.17 66.34 54.54 69.33 
RLT 40.17 42.83 41.83 40.17 50.67 45.67 47.67 53.79 55.53 
TRA 56 52.33 43.5 59.83 54.33 70.67 57.5 70.92 79.91 
LARA 72.5 71.5 66.67 73.17 72.5 69.5 69.33 85.31 59.91 91.67 
RL-RA 47.17 38.33 47 45.83 53.67 66.83 49.5 62.24 60.48 
TRL 515 47.17 51.83 24.83 47.67 57.83 53.33 65.09 61.77 
RA—RL 56.83 59.33 46.17 51.33 56.5 66 59.17 63.67 62.24 62.33 
LA—>RL 55.67 50.83 56.83 42.33 51.83 51.67 55.17 62.78 65.35 
6*PAMAP2 Chest Arm 46.25 9.58 39.44 31.19 40.56 55.28 59.74 57.02 54.84 
Leg—Arm 38.75 31.81 27.36 SAS 37.08 43.56 53.14 53.20 55.67 61.94 
Arm—>Chest 34.03 26.53 29.44 39.27 36.11 51.16 51.82 54.13 54.43 
Leg—Chest 28.89 21.67 32.5 32.84 30.83 32.84 26.07 57.16 57.71 65.00 
Arm—Leg 41.39 24.71 33,19 43.23 33.47 47.69 49.67 54.15 58.83 
Chest>Leg 35.42 22.92 35.83 34.82 35.28 47.19 39.6 51.28 50.74 71.39 
Average 49.87 41.93 42.32 46.07 48.22 57.47 53.64 61.10 59.64 71.76 
Table II 


ACCURACY (%) COMPARISONS BETWEEN SINGLE-DOMAIN ADAPTATIONS AND W-MSDAER ON CROSS-PERSON HAR. 


Method INN SVM RF ELM STL JDA BDA TNNAR DaNN w-MSDAER 
Dataset Task 
12*DSADS St2Stl 57.17 48.5 54.33 66.33 52.83 57.33 T4.17 76.23 69.02 
St3—Stl 51.67 50.67 38 40 50.17 47.33 51.83 65.21 60.48 78.50 
St4—Stl 40.67 49.33 45.33 43.33 49 39.67 66.5 82.35 71.25 
Stl S12 58 56.67 47.83 59.67 56.5 59.5 70.33 80.44 79.91 
St3—S12 70.5 71.5 73.33 63.5 76.67 69.67 65.62 85.61 80.29 88.67 
St4—St2 57.5 53.83 53.83 58.5 61.17 57 67 68.79 63.23 
Stl St3 50 47.67 43,33 48.5 48.83 41 54.17 60.37 55.42 
St2St3 67.33 71.33 84.67 69.67 83.33 69 74.5 82.47 80.58 80.00 
St4St3 63.33 64.5 69.5 57 71 67.83 62.5 78.86 771.49 
Stl St4 46.17 47.83 48.17 42.33 49.83 31.35 47.17 69.02 69.13 
St2St4 68.33 64.33 73.33 57.67 71 62 69.83 73.45 71.28 71.50 
SB St4 63 61.17 57.83 61 65 66 63.67 68.74 68.13 
6*PAMAP2 St2Stl 59.72 50.28 S111 50.5 51.11 64.44 61.39 66.44 62.15 
St3—Stl 58.61 51.11 51.11 50 51.11 60.28 58.89 62.91 65.42 65.28 
Stl St2 61.01 52.41 51.65 50.63 52.66 70.38 64.81 70.39 65.27 
St3—St2 51.9 82.03 93.67 73.92 83.04 66.33 65.57 85.49 80.24 78.99 
Stl St3 57.18 50.65 50.13 39.43 50.65 59.27 58.49 66.34 62.17 
St2St3 65.01 77.81 79.63 77.81 78.07 68.41 72.32 77.49 76.23 68.41 
Average 58.17 58.40 59.27 56.10 61.22 59.04 63.82 73.16 69.87 75.91 
Table III 


ACCURACY (%) COMPARISONS BETWEEN MULTI-DOMAIN ADAPTATIONS AND W-MSDAER ON CROSS-POSITION HAR. 


Method ELM+DSELM+PCR5 BDA+DS BDA+PCR5 JDA+DS JDA+PCR5 JDA+WDS CoDATS CoDATS-WS w-MSDAER 

Dataset Task 

4*DSADS._ _T, RA, RL-LA 73.33 65.33 50.83 65.50 70.33 74.67 77.83 75.27 82.42 80.67 
RA, LA, RL->T 60.00 58.50 36.67 56.67 65.17 67.54 66.89 64.58 62.01 69.33 
T, LA, RL-RA 82.67 79.83 49.67 68.17 65.18 72.67 73.41 85.55 84.38 91.67 
T, RA, LA-+RL 50.17 47.00 48.83 65.00 52.83 59.50 62.82 56.12 62.00 62.33 

3*PAMAP2Chest,Leg—Arm 71.11 72.22 61.11 65.28 69.17 69.44 69.80 54.74 58.29 61.94 
Arm,Leg—>Chest 53.89 61.94 53.96 56.94 55.56 55.83 60.78 60.31 64.93 65.00 
Arm,Chest>Leg 50.82 44.72 56.67 56.94 67.22. 67.78 65.88 68.16 69.60 71.39 

Average 63.14 61.36 S1.11 62.07 63.64 66.78 68.20 66.39 69.09 71.76 

Win/Total 0/7 1/7 0/7 1/7 0/7 0/7 0/7 0/7 1/7 4/7 
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Table IV 
ACCURACY (%) COMPARISONS BETWEEN MULTI-DOMAIN ADAPTATIONS AND W-MSDAER ON CROSS-PERSON HAR. 


Method ELM+DS ELM+PCR5 BDA+DS BDA+PCR5 JDA+DS JDA+PCR5 JDA+WDS CoDATS CoDATS-WS w-MSDAER 

Dataset Task 

4*DSADS. _ St2, St3, St4->St1 40.67 34.83 31.50 52.87 53.33 54.83 60.29 76.58 76.77 78.50 
Stl, St3, St4-St2 65.67 59.00 49.00 73.83 52.50 79.00 79.86 82.24 87.11 88.67 
Stl, St2, St4St3 55.33 56.50 45.30 67.00 53.67 77.00 78.49 72.78 75.41 80.00 
Stl, St2, St3-St4 58.50 48.67 42.00 64.83 37.83 63.67 66.28 64.75 64.49 71.50 

3*PAMAP2 S12, St3->St1 47.50 45.83 56.67 59.44 61.67 63.33 62.30 66.43 68.16 65.28 
Stl, St3-St2 54.18 58.73 65.06 70.38 72.15 72.66 75.39 71.63 74.18 78.99 
Stl, St2St3 49.87 49.35 57.44 66.32 68.15 71.02 71.48 60.39 71.61 68.41 

Average 53.10 50.41 49.56 64.95 57.01 68.78 70.58 70.68 73.96 75.91 

Win/Total 0/7 0/7 0/7 0/7 0/7 0/7 0/7 0/7 2/7 4/7 

Table V 


COMPARISONS BETWEEN WEIGHT CALCULATION ON [19] AND W-MSDAER ON CROSS-POSITION (T,RA,RL—>LA; RA,LA,RL->T) AND 
CROSS-PERSON (ST2,ST3,ST4—+ST1; ST1,ST3,ST4-4+ST2) FOR DSADS DATASET. 


T, RA,RL-LA 
Cross-Position Tasks 


St2,St3,St4— Stl 
Cross-Person Tasks 


Distance 
Imprecision 4 ee 
Our Approach Multi-Criteria |0.40124 


RA, LA,RL—+T 


[____Werghis of Source Domains FRecuracy(]__Weghis of Source Domatns_____ACouracy(™) 

a 
Distance 0. ; E 

Imprecision sara 

Our ApproachMulti-Criteria |0.291 


St1,St3,St4— St2 


Weights of Source Domains lAccuracy(%)) Weights of Source Domains 


Table VI 
COMPARISONS BETWEEN WEIGHT CALCULATION IN [19] AND W-MSDAER ON CROSS-POSITION (T,LA,RL—-RA; T,RA,LA—RL) AND 
CROSS-PERSON (ST1 »8T2,ST4-ST3; ST1 ,8T2,ST3ST4) FOR DSADS DATASET. 


T,LA,RL->RA 
Cross-Position Tasks 


T,RA,LA—RL 


Weights of Source Domains [Accuracy(%) Weights of Source Domains 


a a a © 


Po TAT 
Weight Calculation in 119] a Le a ee az 3304 2 ie ao See zu cae 2 se 60.33 


Distance 
Imprecision ey 
Our ApproachMulti-Criteria |0.3261- 


St1,St2,St4— St3 


[Weights of Souree Domains JRccuracy)____Wefghts of Source Domains —___[Recuracy™) 
j j j j j j j si r 


Cross-Person Tasks 


Distance 
Imprecision 5 sta 
(Our Approach Multi-Criteria |0.292 


order to realize the weight calculation of Liu’s method in [19], 
we follow the specific steps given in [19] to use the A-distance 
[39] to measure the distribution differences between source 
and target domains before and after distribution matching. 
Then, the derived distances are applied to calculate the weights 
of source domains for the final discounting steps. In addition, 
we also discuss the performance of w-MSDAER by using 
one single index and multi-criteria indexes. In Table V and 
VI, Distance means that we only use the single distance 
degree (13) to evaluate the weights of BBAs in w-MSDAER; 
Imprecision means that only the single imprecision degree (14) 
is applied in w-MSDAER; Multi-Criteria means that both the 
mentioned two indexes (13) and (14) are used in our method. 


St1,St2,St3— St4 


The results have been given in TableV and VI. In general, 
w-MSDAER with the multi-criteria strategy performs best 
except the two sub-tasks (St1,St3,St4—St2; T,LA,RL—RA). 
The main reason is that for each test sample, w- MSDAER aims 
to use the multi-criteria strategy (BF-TOPSIS) to evaluate the 
reliabilities/weights of the corresponding BBAs derived from 
classifiers trained by source domains. 

To analyze the performance of w-MSDAER further, we 
present the confusion matrix in Fig.3. This figure shows the 
confusion matrix of RA, LA, RL->T tasks on the DSADS 
dataset (subject 1). We can find that w-MSDAER can achieve 
an accuracy more than 90% for those dynamic activities such 
as ascending and descending stairs, standing in an elevator 
still, moving around in an elevator, walking in a parking lot 
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Figure 3. Confusion matrix of w-MSDAER on RA, LA, RE — T tasks on 
the Cross-Position DSADS dataset. 
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Figure 4. Ablation Study for Cross-Position and Cross-Person Tasks in 
DSADS Dataset. 


and walking on a treadmill. However, there is some high 
misclassification between those static activities which includes 
sitting, standing, lying on back and right side. This can be 
explained by the fact that signals collected from sensors 
located on RA, LA, RL often have the similar features for 
the static activities. 


E. Ablation Studies for Cross-Position and Cross-Person Tasks 


Our proposed method w-MSDAER mainly includes two 
important parts: local domain-invariant learned classifier and 
decision-level weighted fusion strategy based on BF-TOPSIS. 
To further verify the effectiveness of our method, we design 
ablation experiments to evaluate the performance of various 
variant w-MSDAER from multiple perspectives. On the one 
hand, we ensure that the final fusion rules remain unchanged 
and adjust the local adaptive model. That is, we adjust MEDA 
in w-MSDAER to ELM, BDA, and JDA, respectively. From 
Table III and IV, original w-MSDAER performs better than 
other variants. On the other hand, we keep MEDA as the base 
model without adjustment. And we aim to adjust the fusion 


strategy in the decision level and the evaluation criteria in BF- 
TOPSIS. In this case, four variants are considered: MEDA + 
AverageFusion, w-MSDAER (distance), w-MSDAER (impre- 
cision), and original w-MSDAER. MEDA + AverageFusion 
means that we first obtain several classifiers from multi- 
source domains based on MEDA, and then the results of these 
classifiers are combined by the average fusion; w-MSDAER 
(distance) and w-MSDAER (imprecision) means that we just 
use one criterion (dayer (13) or St(.) (14)) to evaluate the 
weights of source domains. The comparison results are shown 
in Fig.4. As we can see in this study, our proposed w- 
MSDAER outperforms all variants, which demonstrates the 
effectiveness of our method. 


F. Parameter analysis 


Similar to other state-of-the-art transfer learning methods 
[9], [22], we did also conduct a sensitivity analysis on the key 
parameters of w-MSDAER. 

1) Subspace dimension d and neighbor p: In this part, 
the sensitivity of d and p have been investigated through 
experiments with a wide range of d € {5,10,--- ,45} and 
p € {10, 20,--- , 100}. These related parameters were selected 
for two experiments: DSADS cross-position HAR (Fig.5 (a) 
and (b)) and PAMAP2 cross-person HAR (Fig.6 (a) and (b)). 
It can be observed that w-MSDAER was robust with regard 
to different values of d and p. Besides, for the cross-position 
HAR task, the accuracy of the sub-task: T, LA, RL — RA 
performed better than others, and for cross-person HAR, Stl, 
St3 —> $2 showed its best performance. The difference of 
transferring effect between different positions and persons 
further shows that it is difficult to achieve high-precision 
activity recognition by relying on a single source domain. And 
in the process of domain adaptation, it is necessary to evaluate 
the reliabilities of all involved source domains. 

2) Regularization parameter and iteration: We also 
ran w-MSDAER, where the regularization parameters A 
and iteration have a wide range of values, respectively. 
Specifically, the choices of these two important parame- 
ters are: A € [0.1,0.5,1,10,100, 1000] and Iteration € 
(1,5, 10,15, 20, 25,30]. Similarly, we observed that w- 
MSDAER could achieve robust performance over a wide range 
of parameter values in Fig.5 (c) and (d), Fig.6 (c) and (d). 
Besides, for the cross-position HAR task, the worst accuracy 
of the w-MSDAER algorithm is only 1.41% worse than its best 
accuracy. As the number of iterations increases, the accuracy 
does not decrease or increase. These results indicate that w- 
MSDAER is not very sensitive to the size of the iteration. 

3) Time complexity: To evaluate the time complexity of 
w-MSDAER, on the DSADS and PAMAP2 datasets, we 
compared the running time of our method with classical 
domain adaptation methods on cross-position and cross-person 
HAR tasks. The running platform is Matlab™ on a personal 
laptop, using Intel Core 17-6810HQ CPU@3.40 GHz and 
32.00 GB RAM. The running time is shown in Table VII. In 
comparison, the time complexity of BDA and w-MSDAER 
is significantly higher than other mentioned methods. This 
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Figure 5. Classification Accuracy for DSADS Cross-Position HAR w.r.t. d, p, and , respectively. (d) Convergence Analysis. 
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Figure 6. Classification Accuracy for PAMAP2 Cross-Person HAR w.r.t. d, p, and A, respectively. (d) Convergence Analysis. 
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Table VII 
TIME COMPLEXITY COMPARISON (S) 
Method STL JDA BDA w-MSDAER 
Dataset Task 
2*DSADS Cross Position 6.0650 12.3572 = 27.8737 57.3987 
Cross Person 5.9608 16.0091 30.1911 62.1064 
2*PAMAP2 ~~ Cross Position 4.1261 1.7503 4.3576 5.5263 
Cross Person 4.7962 2.5645 5.4725 5.0670 


is mainly because w-MSDAER involves reliability evaluation 
and integrated fusion process. To reduce the time complexity 
of w-MSDAER, our future work will focus on the use of 


faster fusion rules and simpler domain reliability evaluation 
strategies. 


V. CONCLUSION AND FUTURE WORKS 


In this paper, we propose a novel multi-source weighted 
domain adaptation with evidential reasoning (w-MSDAER) 
approach for cross-domain activity recognition. Compared to 
existing works, w-MSDAER first learns the domain-invariant 
classifiers from multi-source domains. Then, these classifiers 
have been evaluated by BF-TOPSIS and then fused with 
the discounting PCR5 rule. Finally, we conducted extensive 
experiments on two publicly available activity classification 
datasets. The results show that the w-MSDAER algorithm is 
superior to other advanced traditional and domain adaptation 
algorithms. In future works, we will focus on researching 
more robust evaluation methods for the reliability of source- 
domain. Besides, we will attempt more cautious decision- 
making strategies [19] based on belief functions to assign an 
unknown activity to the set of the classes, and if possible, 


we will also evaluate the improved versions of PCR rules of 
combinations that are currently under development. 
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Abstract—This paper presents a generic a flexible user-oriented 
rusted toolbox for information fusion. This framework imple- 
ments common belief or belief-derived functions. It implements 
combination rules in a generic way by means of rule definitions 
based on referee functions. It implements two logical frameworks, 
powersets and taxonomies, to define beliefs. The framework 
is based on a multithreaded and asynchronous architecture, 
which also makes it possible to deploy the software on multiple 
machines. The architecture is highly modular and adaptive, 
allowing simple definition and layout of the software using 
editable configuration files. One of the objectives is to modularize 
and compare both computation engines and user interfaces. The 
rust programming language is used to implement the framework 
because of its speed, security and efficiency in maintaining the 
library code. 


Keywords: rust, toolbox, belief functions, decision-making, 
distance between BBAs. 


I. INTRODUCTION 


The page of the Belief Functions and Applications Society!, 
lists some software and toolboxes dedicated to the computation 
of belief functions and combination of belief functions. These 
toolboxes are generally implemented in R, python or Matlab 
languages. Toolbox ibelief [1], implemented in the R language, 
especially offers a variety of possible rules and decision cri- 
teria. Considering these already existing libraries, we need to 
clarify our motivation for offering this new FURTIF? toolbox. 
Our motivation is fourfold. 


(a) Use the full power of a system programming language. 
The performance of languages such as MatLab, Python 
or R is significantly lower than that of a system lan- 
guage such as C, Rust (both of which are official Linux 
kernel languages [2]) or C++. The reasons for this are 
manifold and depend on the languages involved (typing 
strategy, use of garbage collection, interpreted versus 
compiled code). Often, optimized language components 
are based on fast system language implementations. Typ- 
ically, vectorization can offer better performance with 
languages like R, MatLab and Python, but this limits 
programming to vector data modeling. For our part, we 
would like to have libraries that allow full freedom in the 
implementation of data and methods. For this reason, we 
need high-performance general-purpose languages such 


'https://bfasociety.org 
2FURTIF being the acronym of Flexible User-oriented Rusted Toolbox for 
Information Fusion. 
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(b) 


(c) 


as system programming languages. In this environment, 
we propose two types of lattice on which to represent 
belief information. The first is obviously the powerset, 
which provides a general Boolean algebra representation 
based on bit vectors. The second is the taxonomy, which 
is a lattice without negation operators or operator distribu- 
tivity properties. The results of the combination rules are 
significantly different for these two types of lattice. From 
a propositional point of view, Boolean algebra allows a 
finer representation of information than taxonomy, which 
constrains knowledge to taxa. However, the use of a tax- 
onomy allows real control of the combinatorial explosion, 
and makes more sense from the point of view of the 
human operator. A vector representation of the taxonomy, 
although we are approaching it, is not necessarily the 
most efficient. For this reason, we need to be able to 
implement dedicated structures while maintaining a very 
high level of performance. 

Easy-to-implement parallelization capabilities. The con- 
cern for parallelism is, of course, a consequence of 
the need for performance. There are many aspects to 
managing parallelism, not least synchronism or asynchro- 
nism, and secure data sharing. We need a programming 
language that facilitates this. In addition, we are looking 
for an agnostic approach to setting up exchange channels 
between processes. In particular, this agnosticism presents 
a certain difficulty in reconciling single-machine and 
multi-machine modes of operation. 

Easily expand the toolbox. One of our objectives is 
to enable users to easily adapt and extend the scope 
of the toolbox. In this perspective, the possibility of 
creating and testing new rules guides our approach. We 
should mention our work presented in [3], in which we 
described referee functions, a generic and semantically 
meaningful way of defining rules for combining belief 
functions; based on this work, we proposed a (now 
obsolete) java library enabling these functionalities. This 
kind of definition indeed can be combined with various 
generic engines for effective fusion computation, enabling 
the rule designer to limit his development efforts to rule 
definition alone. Furthermore, as part of future perspec- 
tives, we aim to make the software framework easily 
extensible, typically with the possibility of including new 
processing codes, new human-machine interfaces, and the 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


possibility of interacting with other information process- 
ing paradigms. All of this, again, with the intention of 
being user-oriented. Here again, asynchronous parallelism 
capabilities are an interesting prospect for this objective. 
(d) Maintainability. From our point of view, it is essential 
that our toolbox is easy to maintain. With this in mind, 
we need the tools of a modern programming language, 
and its environment, to facilitate and automate this task. 
These tools can be grammatical, based on high-level 
language expressivity, or managerial, based on the well 
organization of internal or imported packages. 

In fact, there are very few system programming languages 
that have the qualities of a high-level language, with mecha- 
nisms for securing access to data, particularly with a view to 
parallelism or asynchronism. Rust language*, is among them. 
For this reason, we have chosen the Rust language for our 
toolbox project. 


In the following sections, we proceed to: 

e a brief overview of belief functions and referee fusion 
approaches, 

e a brief introduction to the Rust language, mentioning 
some of its qualities and components, 

e details of the framework we have developed. In particular, 
we will detail our approach to implementing taxonomies 
as lattices. We will present the FURTIF toolbox in its 
global context, then focus on the particular aspects related 
to belief functions, and finally present future develop- 
ments of our work. 


II. BELIEF FUNCTIONS 


We begin this section of the paper with a quick introduction 
to belief functions [4]-[6]. To begin with, we introduce the 
notion of bounded lattice, which is a fairly general logical 
support for the description of belief functions. Next, we 
introduce the notion of belief mass, and then some belief and 
other functions that can be derived from it. We then present 
various rules for combining beliefs. Finally, we introduce the 
notion of referee functions, which is a general formalism that 
allows combination rules to be defined independently of the 
implementation of rule computation processes. 


A. Bounded lattices 


A bounded lattice is a partially ordered set (L, <, 1, T) 
with lower and upper bounds | and T in which any pair 
of elements X,Y C L has a greater minorant X A Y and 
a lesser majorant X V Y. It therefore follows that the lattice 
induces two operators, \ and V, which have interesting logical 
properties for representing the conjunction or disjunction of 
two propositions respectively. In a general bounded lattice, the 
operators A and V are associative and commutative. Bounded 
lattices verify the absorption property X\(XVY) = X = XV 
(X AY) and it is possible to retrieve the order relation X < Y 
from operators using the properties X¥ <Y <=> XAY =X 
or X ~ Y => XVY =Y. On the other hand, the operators 
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A and V are not necessarily distributive, and a bounded lattice 
does not have a negation operator in all generality. 

1) Powerset: Powersets are well-known examples of 
bounded lattices, and are used in the vast majority of belief 
function literature, as well as in the libraries mentioned in 
the introduction. We define a frame of discernment (FoD) 0, 
which is the (finite) set of elementary propositions character- 
izing the knowledge frame. The powerset 2° = {X Cc O} 
is the set of subsets of O. The powerset is a bounded lattice 
whose order relation is the ensemblistic inclusion C and whose 
smallest and largest elements are respectively the empty set 0) 
and the total set 0. The conjunction and disjunction operators 
are the set intersection and union M and U respectively. A pow- 
erset is also a Boolean algebra, i.e. it is a distributive bounded 
lattice (the operators M and U are distributive) which has a 
negation operator ~X = © \ X, ie. verifying X NAX = 0 
and X U7AX = 0. 

2) Taxonomy: In our (simplified) definition, a taxonomy is 
a tree in which each node or leaf is characterized by a symbol 
representing a proposition (a taxon). To this tree, we add as a 
descendant of each leaf the impossible taxon L representing 
a contradiction. 


AAAA 
AA | AAA 
J AAAB 
[4} ABA 
ABBA | | + 
AB LL{ ABB 
ae ABBB 
ABC 


Figure 1. Example of taxonomy (taxon names are unimportant, except for 
the sake of presentation). 


Figure 1 shows an example of taxonomy, where the set of 
taxa is: 
L = {A, AA, AAA, AAAA, AAAB, AB, 
ABA, ABB, ABBA, ABBB, ABC, L} 


The greatest taxon of this taxonomy is T = A while the 
smallest taxon is L. The disjunction between two taxa is the 
closest common ancestor of both (or one of the taxa, if it is 
an ancestor of the other). For example, we have the following 
results: 


ABAV ABBA = AB; ABV 1 = AB: ABV ABBA = AB 


The conjunction between two taxa is the closest descendant 
common to both (or one of the taxa, if it is a descendant of 
the other). For example, we have the following results: 


ABANABBA=1; ABAL=1; AB\ABBA= ABBA 


In practice, the conjunction of two non-impossible taxa is L 
unless one of the taxa is a descendant of the other. 


Considering these elements, it follows that a taxonomy is a 
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bounded lattice. On the other hand, taxonomy has no negation 
operator and is not associative, since: 


(ABAA ABC)V (ABBA ABC) =LVL=L1, 
while: 
(ABAV ABB) \ ABC = ABA ABC = ABC. 


3) Motivation for using taxonomies instead of powersets: 
The size of a powerset is exponential with respect to the size 
of the discernment frame. In contrast, the size of a taxonomy 
remains reasonable and controlled. This is particularly useful 
for avoiding the combinatorial explosion in the calculation of 
combination rules presented below. Another motivation is less 
technical: a taxonomy has an immediate operational meaning 
that a powerset does not. 


B. Belief mass 


A belief mass on the bounded lattice (L, <, L, T) is defined 
as a function: 
m:DL—R4, 


which verifies: 
S- m(X)=1. (1) 
XEL 
We will not discuss here the possibility and interpretation 
of having a non-zero mass put on the contradiction, that is 
m(L) > 0. Our library allows for this situation, but does not 
interpret it. But it is the user’s usage that will give m(_) its 
semantics. In addition, certain belief functions derived from 
masses and combination rules can assign a specific role to L 
or T. 


C. Belief functions 


Certain belief functions of particular interest can be derived 
from belief masses, and this is implemented in the FURTIF 
toolbox we propose. We briefly present their definitions with- 
out going into further detail. 

1) Implicability, credibility, commonality, plausibility: Let a 
belief mass m be defined on the bounded lattice (ZL, <, 1, T). 
Implicability of X € L is defined as follows [7]: 


mY): 


Y:Y<X 
Credibility of X € L is defined as follows: 


ns 


Y:L4Y<X 


Bel(X mY). 


The transformation of implicability into credibility, and vice 
versa, iS immediate. 


Commonality of X € L is defined as follows: 
Ss” m(¥). 

Y:X<Y 

Plausibility of X € L is defined as follows: 


De 


V:YAXAL 


PU(X) = m(Y). 


If the bounded lattice is a Boolean algebra of negation operator 
—, it is well known that plausibility and implicability are dual 
by relation: 


If L is a Boolean algebra, then PI(X) =1—6(-X). (2) 


Under the assumption m(L) = 0, we obtain that 6 = Bel and 
we find the well-known property, PI(X) = 1 — Bel(AX), 
valid under this hypothesis. 


The property (2) is irrelevant for a general bounded lattice. 

2) Equivalence: On the lattice (L,~<, 1, T), it is possible 
to compute any of the functions m, b, Bel, Q, Pl from any 
of the functions m, b, Bel, Q, see [7] for details. 


If the bounded lattice is a Boolean algebra, then it is possible 
to compute any of the functions m, b, Bel, Q from function 
Pl thanks to duality (2). This property is false for general 
bounded lattice. 

3) Pignistic probability: Given a mass function m defined 
on a powerset 2°, the pignistic probability on © is computed 
from m by a cardinality ratio: 


BetP({6})= >> 


X0EX 


m(X) 
|X| - 


In the FURTIF toolbox, a similar definition of pignistic 
probability is proposed for taxonomies, based on a predefined 
weight on each leaf. Note, however, that this kind of general- 
ization is not suitable for all bounded lattices [9]. 


D. Combination rules 


Let (LZ, ~<,,T) be a bounded lattice. We assume several 
belief masses, ™1,..., my, defined on L and obtained from 
several sources of information. A key problem is to deduce 
a fused belief mass from the various sources. The reference 
combination rule is the Dempster-Shafer rule: 


(Yin )ELN k=1:N 
YA -AYn4L 


m, ®::-@myn(X) = 


The normalization term 1 — Z fulfills condition (1). The value 
Z is a measure of conflict between sources. 


The conflict resulting from the combination of belief masses 
is a problem that has led to a multiplication of viewpoints. 
As a result, numerous rules have been proposed to deal with 
the conflict differently. We will not go into them all, but 
we will mention the following rule in particular, since we’ve 
implemented them in our toolbox: conjunctive rule, disjunctive 
rule, Dubois & Prade rule [11], PCR6 rule [10], which is an 
example of proportional conflict redistribution rule [12]. 


E. Referee functions and combination rules 


The Referee function was introduced in the DSmT book 
3 in 2009 [12], but it followed some work on a probabilistic 
version of PCR5 [13]. A presentation of the notion can also be 
found in [3], as well as some more advanced thoughts in [8]. 


543 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


1) General: Let’s assume a bounded lattice (LZ, <, L, T) 
and a sequence of belief masses m .y defined on this lattice. 
Let Yj:1 be a sequence of propositions of L. Then the 
conditional function X + F(X|Y1.n,m1.n) defined for any 
proposition X € L is a referee function if it verifies: 


F(\Yin,miw) 20 and S° F(X|Yiw, main) € [0,1]. 


XEL 
(3) 
Given a referee function, it is then possible to define a 
combination rule, where the merged belief mass takes the 
form: 


1 N 
2S rexvinmon fmt) 
Yi:w €LN kel 


The normalization term 1 — Z fulfills condition (1). The value 
Z is a measure of conflict between sources for rule @[-|F). 


We showed in [3], [8] that referee functions can be used to 
define a large number of common rules and facilitate the 
design of new custom rules. From an implementation point 
of view, this formulation also makes it possible to distinguish: 

e rule definition, which, for example, is left to the user if 
he wishes to test customized rules. This is formalized by 
F(X|Yi.n,™1:n) in (4), 

e from the generic implementation of the 
computation of this rule, which is formalized by 
Dyess ‘Ts me(Yx.) in (A). 

Of course, the computation of the combination rule is the 
generic part that must be handled by the toolbox. 

2) Rules definitions: Given the logical proposition P, we 

define [P], the Iverson bracket [14] on P, by: 


0 if P is false, 
Fleas eeeee 
1 if P is true. 
See also the appendix A for a note on this notation. 
The toolbox implements the referee functions of the follow- 


ing rules as standard: 
a) Conjunctive rule: 


i ¥ 


F)(X|Yi.w, m1.) — cf = 
k=1:N 


The rule produces no conflict, i.e. Z = 0, but can produce 
non-zero mass on L. 
b) Disjunctive rule: 


F(X|Yi.n, mw) = [x= aes 


The rule produces no conflict. 
c) Dempster Shafer rule: 


J, “Ao 


k=1:N 


x= /\ Y, 


k=1:N 


Fps(X|Yi:w, m:n) — | 


The rule produces conflict. 


d) Dubois & Prade rule: 


Fop(X|Y1:2,m1.2) = [Yi A Yo A L)[X = V1 A Yo] 
LV RIGS eV Aria y 


The rule produces no conflict. 
e) PCR6 rule: 


Fpeor6(X|Y1:n, Min) = /\ Y,41L| |xX= \ Y; 
k=1:N k=1:N 
+4 rN etl rein M(Vi) [X = Yi] 
k=1:N en mi(¥i) 


The rule produces no conflict. 

J) PCR# rule: Referee function of PCR# rule is al- 
gorithmically defined in [3], [8]. This rule proportionally 
redistributes the conflict over the largest possible consensuses 
(a k-consensus is the non-empty intersection of & propositions 
Y;). In comparison, PCR6 only redistributes on consensuses 
of size 1 (single Y;). The rule produces no conflict. 


III. RUST PROGRAMMING LANGUAGE 


We feel it is important to outline the features of the Rust 
language that can help us achieve the four objectives we set 
out in the introduction. The aim here is not to give a formal, 
detailed description of the language, and we shall restrict 
ourselves to providing qualitative insights. Language learning 
resources are available at [17]. 


A. System programming 


To begin with, Rust is indeed a system language, insofar 
as Rust is one of the 2 official languages, along with C, of 
the Linux kernel [2], Microsoft has decided to use Rust code 
to secure certain parts of the windows operating system (OS) 
[15], and there is an OS, Redox, in the advanced development 
phase, essentially based on the Rust language [16]. The Rust 
language has thus won the support of major players in the 
field of critical code development. From a practical point of 
view, the motivations reported by these language users are 
of several kinds. In particular, the language is recognized as 
fast and optimized: it can be compiled as native code or as 
web assembly code; it is based on direct memory management 
and not on a garbage collector; it is based on a strong but 
simple typing system; certain pointer characteristics enable 
specific compiler optimizations. The language is recognized as 
safe: pointers are characterized by pointer borrowing semantics 
that greatly consolidate memory management and prevent 
memory usage violations. This memory security also makes 
this language ideal for parallel and asynchronous applications. 

These features are of interest to us, as they offer the 
prospect of fast, massive computational tools. The possibility 
of parallelization also allows us to consider an increase in 
computing power. But it also makes it easier to build a set of 
modular tools that can be used to create complex applications 
that can be deployed on several processes or machines. 
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B. Secure memory and resources management with Rust 


Rust’s secure use of memory is not based on a garbage 
collector, but on direct access to memory through pointers, 
constrained by borrowing semantics and data lifetime seman- 
tics. This enables the compiler to ensure that memory and 
resources are used legally and securely at compile time. 


Borrowing semantics allow us to distinguish three states of 
data: 

e the process may have ownership of the data, 

e the process may have a non-mutable pointer to the data, 

e the process may have a mutable pointer to the data. 

The semantics then guarantee either that there can only be a 
single mutable access to the data, or that there are one or more 
non-mutable accesses to the data. This ensures that processes 
do not use memory in a conflicting way. 

Data lifetime semantics enable the compiler to trace 
throughout the code that the use of a data item is valid given its 
lifetime. Of course, this contributes to the automatic allocation 
and deallocation of memory, but it also makes it possible to 
condition memory usage in parallelized contexts. 


C. Type and trait 


The multiparadigm of Rust and all its libraries, including 
standard ones, is based on two hierarchies: a data hierarchy 
(Type) and a functional hierarchy (Trait). 


1) Type hierarchy: The type hierarchy is the result of the 
type construction mechanism: 


e primitive types such as integers, reals, characters, ... 
e structures made up of several sub-type fields and 
pointers, 


1 struct MyStructure { // 
field_1: ul28, // £ 

A field_2: Type2, , 

4 field_3: Type3, 

BE } 


e labeled unions (Rust’s enum), which enable several data 
types to be used in the same memory field, 
e other constructors such as arrays and tuples. 
There are no classes in the object programming sense, so 
data structures remain simple. In contrast, Rust features an 
elaborate pattern matching system for data types. 


it is in data types that we concretely implement the 
components of our toolbox. For example: the lattices, 


Powerset and Taxonomy; the referee functions, 
Disjunctive, Conjunctive, DempsterShafer, 
DuboisPrade, Pcr6, PcrSharp; the engine for 


computing combination rules. 


2) Traits hierarchy: A trait in the Rust programming 
language is a collection of methods corresponding to a group 
of functionalities and properties. Some of the methods in 
a trait can be defined by default, while others are simply 
declared and must then be defined by any type implementing 
that trait. A trait is not a class or an abstract class in that it is 
not associated with any data. A trait can inherit another trait. 


As an example, in the toolbox we define the Lattice 


and ComplementedLattice traits, whose incomplete an 
simplified definitions are shown below: 


1 trait Lattice { 
encoding type for the 


type Item; 


ct 
faa 
cr 
fa. 


elements of 
3 

4 // hash code of the lattice 

5 fn lattice_hash(&self)-—> u128; 
6 // least element of the lattice 
1 fn bottom(&self) -> Self::Item; 

8 // greatest element of the lattic 
9 fn top(&self) -> Self 

10 // greatest lower bound 


J d 
Self::Item, right: 


Item; 


ul fn meet (&self, left: Self: :Item) 
2 —> Self::Item; 
13 // least upper bound 
4 fn join(&self, left: Self::Item, right: Self: :Item) 
15 —> Self::Item; 
16 // test if left and right are disjoint 
i fn disjoint (&self, left: Self::Item, right: Self::Item) 
18 -—> bool { 
19 self.meet (left,right) == self.bottom() 
20 } 
21 // test if left implies (i.e. is contained by) right 
fn implies(&self, left: Self::Item, right: Self::Item) 
3 —> bool { 
4 self.join(left,right) == right 
25 } 
6 ieee 
a 
1 trait ComplementedLattice: Lattice { 


3 én ae a Self::Item) -—> Self::Item; 
s } 

The Lattice trait describes the features and qualities of a 
bounded lattice. This feature declares the meet and join 
methods, which correspond to the A and V operators. It also 
declares the bottom and top methods, which respectively 
produce the minimum | and maximum T elements of the 
lattice. In addition, this trait provides a default implementation 
of the implies and disjoint methods, which correspond 
respectively to the lattice’s order relation and to a disjunction 
test of two lattice elements. 

The ComplementedLattice trait describes the 
features and qualities of a Boolean algebra (distributive 
bounded lattice with negation operator). We see that 
ComplementedLattice inherits’ Lattice’s features, 
through the declaration ComplementedLattice: 
Lattice, to which it adds the declaration of the neg 
method. 


As such, a trait represents only a set of features, but remains 
an abstraction that conveys no intrinsic data. A trait is useful 
for implementing certain functionalities in a generic way. 
The resulting code can be reused by future developers, for 
example, by implementing the trait on their own data types: 
only declarations not implemented by the trait need be 
implemented by the data type. In fact, to actually use a trait, 
you need to do so through a data type that implements the 
trait. 


Let us go on with this toolbox-inspired example. A data type 
for powesets can be defined by (the actual definition is a little 
more complex): 


1 Struct Powerset { 

hash: ul28, // a has 
3 card_theta: u8, / 
4} 
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This simplified definition of Powerset contains two 
fields. The hash field is a quasi-unique code that almost 
certainly certifies the identity of the lattice. This field is used 
by the toolbox to check data consistency. The card_data 
field gives the value of |O|, the cardinality of O. The 
Lattice and ComplementedLattice traits are then 
implemented by the Powerset type as follows: 


1 impl Lattice for Powerset { 
type Item = u128; 
fn lattice_hash(&self)—-> u128 { self.hash } 
fn bottom(é&self) -> Self::Item { 0 } 


3 
4 

5 fn top(&self) -> Self::Item { 

6 1 << self.card_theta - 1 

7 } 

8 fn meet (&self, left: Self::Item, right: Self::Item) 

9 -—> Self::Item { 


10 left & right 

u } 

RD fn join(&self, left: Self::Item, right: Self: :Item) 

13 -—> Self::Item { 
4 left | right 

15 } 

1 } 


1 impl ComplementedLattice for Powerset { 
fn neg(&self, element: Self::Item) -> Self::Item { 
3 self.top() & !element 
4 } 
5} } 

In this implementation, powerset elements are encoded by 
the type Item = u128, i.e. an unsigned 128-bit integer. 
In this implementation, each bit of the integer represents an 
element of O. It is therefore possible to manage a frame of 
discernment containing up to 128 elements. 


The operators A and V derive directly from the bitwise 
Boolean operators & and | on integers. As expected, | 
and T are respectively defined as 0 and 2!°! — 1 in this 
binary representation. The negation operator is defined by 
self.top() & !element, ie. a bitwise-not of the 128- 
bit integer combined with a conjunction with T. 


The definition of type Taxonomy is more complex, and 
we will not go into it in this section. However, we can 
say that taxonomy elements are also encoded by a primitive 
type u128, but the encoding of both taxa and operators is 
different. The Taxonomy type implements Lattice, but not 
the ComplementedLattice. 


D. Rust and code maintainability 


Previously, we presented the basic and fundamental con- 
cepts of traits and types in the Rust language. These concepts 
are combined with other language features and tools to facil- 
itate library development and maintenance. 

1) Conditional implementation: The grammar of the Rust 
language allows sophisticated designs in terms of generic 
programming. We will not go into this in detail, but we will 
nevertheless present an important consequence: Rust allows 
conditional implementations. More precisely, the designer of 
a library can have a set of traits automatically implemented 
by a user-defined type if certain conditions are met. 

Let us illustrate this with an example. 


We have _ defined 
IterableLattice 


which are 
Trait 


two further traits, 
and BeliefTransform. 


IterableLattice takes into account the lattice’s graph 
structure when exploring its elements. For example, this 
trait provides methods for iterating lattice elements with 
monotonicity properties with respect to the partial lattice 
order. Its incomplete and simplified definition takes the form: 


1 trait IterableLattice: Lattice { 
type IntoIterUp: Iterator<Item = Self::Item>; 


type IntoIterDown: Iterator<Item = Self: :Item>; 

// lattice iterator, non decreasing with inference 
fn bottom_to_top(&self) -> Self::IntoIterUp; 

/ lattice iterator, non increasing with inference 


fn top_to_bottom(&self) -—> Self::IntoIterDown; 


} 
Trait IterableLattice contains type and method 
declarations that must be defined by the types implementing 
the trait. Types Powerset and Taxonomy thus implement 
IterableLattice. 


Trait BeliefTransform provides methods for the 
transformation and inverse transformation of belief masses 
or belief functions. Mass and belief function data are 
carried by the generic type Assignment <Self::Item>, 
which depends on the type parameter Self::Item. The 
incomplete and simplified definition of BeliefTransform 
takes the form: 


1 trait BeliefTransform where Self: 
// mass to credibility 
fn mass_to_credibility( 

&self, mass: Assignment<Self: :Item> 
) —> Assignment<Self::Item> { 


odes 


IterableLattice { 


transform 


} 


3 
4 
5 
6 / some 
1 
8 
9 


// credibility to mass transform 


10 fn mass_from_credibility ( 
u éself, credibility: Assignment<Self: :Item> 
2 ) —> Assignment<Self::Item> { 


B // some codes 


vn } 

Methods of BeliefTransform are all implemented, and 
it should be noted that this trait requires that the types 
implementing it (symbolized by Self) also implement 
IterableLattice. Actually, BeliefTransform uses methods 
of IterableLattice to explore the lattice. 


The definition of BeliefTransform is followed by the 
following piece of code: 


1 impl<L> BeliefTransform for L where L: IterableLattice { } 


This code means that any type L, which implements the trait 
IterableLattice, will automatically implement the trait 
BeliefTransform. This is an example of conditional 
implementation. 


2) Macros: Basically, macros are used to modify program 
code according to different contexts during the compilation 
process. Rust has a double macro system [18]. In both cases, 
Rust macros work at the level of the program’s abstract 
syntax tree. This is done in an advanced phase of code 
text analysis, but before the compiled code is generated (in 
comparison, C/C++ macros work at an early stage). In this 
respect, Rust macros are a powerful metaprogramming tool, 
enabling elaborate manipulation of the generated code. The 
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two Rust macro systems are: 


e declarative macros, simpler to use but less powerful. The 
metacode of these macros is integrated directly into the 
program’s source code. It is mostly based on pattern 
matching for its expansion. 

e procedural macros, more complex but very powerful, 
require the development and execution of a Rust sub- 
program to analyze and modify the abstract syntax tree 
of the main program. 


While the design of macros is highly technical, the use of a 
macro by a third-party developer is relatively straightforward, 
making it a powerful tool for defining and maintaining 
libraries. 


To illustrate this, we give an example of a macro from our 
toolbox. As our toolbox can be implemented on several 
processes or machines, we need to control the consistency of 
data types exchanged between several processes. To this end, 
we have defined the HashedTypeDef trait, which must be 
implemented by all data types that can be exchanged between 
processes. The simplified definition of this trait is as follows: 


1 trait HashedTypeDef { 


£ t elf 


) conet. TYPE HASH: 128; 
4} 

We can see here that any type implementing 
HashedTypeDef will be associated with the hash code 
TYPE_HASH. TYPE_HASH must be almost unique for all 
types defined with this trait. For this reason, this feature 
must be implemented by a dedicated macro, which builds the 
hash code at compile time in such a way as to ensure this 


quasi-uniqueness property. 


There is little point in presenting the implementation details 
of the macro in itself. However, it is interesting to see how 
it can be deployed by the designer of a data type. Let us 
consider type Assignment mentioned earlier. This data type 
must be exchanged between processes, and must therefore 
implement HashedTypeDef. To do this, all we need to do 
is mention the use of the macro when defining the structure: 


1 #[derive (HashedTypeDef) ] 

2 // simplified definition of as 
3 pub struct Assignment<xX> { 

4 assignment: HashMap<x, f64>, 
5 lattice_hash: ul128, 

6 } 


It follows that Assignment: :<X>::TYPE_HASH will 
then contain the constant value of the hash code of 
type Assignment<X>. On the other hand, the fields 
assignment and lattice_hash are not constant and 
respectively represent the content of a belief mass or function 
and the hash code of the lattice on which this mass or 
function is defined. 


signment 


3) Package management: cargo is Rust’s package man- 
ager. It tracks all the packages needed to compile an executable 
or library, along with their version constraints and options. It 
uses the rustc compiler for the various compilation phases 
and interacts with the crates.io package repository. In fact, 


the majority of Rust libraries of interest are available from 
crates.io. The crates.io repository also offers rather efficient 
package search facilities. 


To use cargo, the project and its dependencies must be 
described in cargo.toml files. These descriptive files are 
relatively user-friendly. 


E. Paradigms 
Rust combines several programming paradigms: 


e functional programming, with mutability control, rich 
pattern matching and iterators, 

e imperative and structured programming, 

e object-oriented programming using traits; there is no 
notion of classes in Rust, 

e generic programming. Rust’s generic programming and 
metaprogramming capabilities are rich, 

e parallel and concurrent programming, whose implementa- 
tion is greatly secured by the language’s memory security 
capabilities. In particular, Rust handles asynchronous 
programming very well. 


F. Multithread and asynchronism 


We need to be able to operate concurrently to give the tool- 
box real flexibility in its interaction with external software and 
environments. Fortunately, Rust supports different parallelism 
paradigms. 

The standard library offers tools to facilitate multithreaded 
programming, including different communication channels be- 
tween threads, and precise management of thread-safe types 
and data sharing (thread-safe reference counting pointer, mutex 
and atomic types). In addition, a number of leading libraries 
are available on crates.io to facilitate the implementation of 
multithreaded parallelism. 

Rust supports asynchronous programming using futures. 
The language itself incorporates the keywords async and 
await, which facilitate asynchronous programming using 
futures. The tokio library* [19], among others, offers various 
extensions and the availability of an asynchronous runtime 
environment. 


IV. DESCRIPTION OF THE TOOLBOX AND PERSPECTIVES 


In this section, we present the architecture and components 
already available for the toolbox, the components in the 
near future, and the prospects that are foreseeable but not 
yet planned. In addition, we present a simple example of 
implementation in the distributed and concurrent framework 
enabled by the tools. 


A. Global context of the toolbox 


The toolbox is implemented around the Silx middleware we 
developed. The belief function components can nevertheless be 
used independently of Silx as libraries used by other programs. 
However, the role of the Silx middleware is to link the belief 
function components to other components, enabling the belief 


*https://crates.io/crates/tokio 
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Figure 2. FURTIF toolbox: global context. 


function tools to be used without compilation, or to potentially 
interact with other tools and software. 

Figure 2 shows the overall architecture of the toolbox. We 
can see that the Silx middleware (blue cylinder) integrates 
hard-coded components, which comprise all the functionalities 
embraced by the considered distribution of the toolbox (light 
red arrow shape on the left). These components alone do not 
constitute an application as such, but need to be arranged in a 
computing network. To this end, Silx takes configuration files 
as parameters, which are used to define the computation net- 
work (green rectangle on the right). With Silx, the component 
codes and the configuration files, the toolbox enables or will 
enable the implementation and interaction of the following 
functional components: 


e belief function computation components, 

e input/output components, 

e command-line interface components, 

e components defining domain-specific languages for belief 
functions (currently being defined), 

e future unplanned components. 


We will now describe some of the middleware’s features. To 
begin with, Silx is based on the tokio library [19], a refer- 
ence Rust library for asynchronous concurrent programming 
offering facilities for networked application design. Based on 
the tokio infrastructure, we have built an environment which, 
based on user-friendly and editable descriptive parameter files, 
launches asynchronous components and establishes communi- 
cation channels between these components. The communica- 


tion channels are also asynchronous and of different types. 
In particular, they can be used to broadcast massages to a 
set of recipients, or to obtain from a recipient a reply to 
a request. These channels can operate on a single machine 
or be established on several machines using network sockets. 
The interface of these channels remains essentially the same 
whether the context is single-machine or multi-machine. Last 
but not least, Silx uses a common binary serialization format 
for both communication contexts. 

The serialization library we use is called rkyv [20], and 
is a zero-copy deserialization framework. This means that 
it is possible to directly access the fields of a serialized 
structure without having to deserialize it: the result is an 
almost negligible performance penalty, even when running 
on a single machine. In addition, we have opted for endian 
agnostic digital formats (i.e. the order in which the bytes of a 
number are encoded remains the same, whatever the type of 
microprocessor). 


B. Belief functions components 


Figure 3 describes all the components implemented or 
planned for the manipulation and fusion of belief functions. 

e Transformations between belief masses, implicabilities, 
credibilities, plausibilities and commonalities are im- 
plemented if lattice properties allow (dashed arrows if 
implementation is constrained by lattice type). 

e Several combination rules are implemented by means 
of referee functions (Conjunctive, Disjunctive, Dempster- 
Shafer, Dubois & Prade, PCR6, PCR#) and an exact but 
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Figure 3. FURTIF toolbox: belief functions components. 


discounted combination engine (the number of proposi- 
tions with a non-zero mass is limited). A Monte Carlo 
combination rule is planned, as well as a generic arbi- 
tration function that can be designed by the user. For 
the user-designed referee function, reflection on domain- 
specific languages is planned. 

e Implementations of Dezert entropy, divergence and dis- 
tance [21], [22] are also planned. 


C. Taxonomy and Lattice implementation 


There are several possible approaches to encoding a taxon- 
omy element. One approach is to build two dictionaries that 
map each pair (X,Y) of elements of the taxonomy to X A Y 
and X V Y respectively. The use of a dictionary based on a 
hash function (HashMap) makes it possible to compute the 
operators in constant time. However, the quadratic size of the 
structures may be prohibitive for a large taxonomy. 

Figures 4 and 5 describe a more efficient approach to 
encoding the taxonomy and calculating the A and V operators. 
In fact, there is a direct encoding based on binary calculus of 
taxa and operators when the taxonomy is in the form of a 
binary tree. An example of this encoding is shown in figure 5. 
The situation is a little more complex when the taxonomy 
is in non-binary form, and requires a prior transformation of 
the taxonomy tree into a binary tree (figure taxo:to:binary). 
We will see that this transformation subsequently induces the 
necessary use of a dictionary of the same order of magnitude 
as the taxonomy, in order to finalize the calculation of the 
V operator. This is perfectly fine in practice. Now let us get 
down to the details: 


e As shown in Figure 4, the taxonomy is rendered binary by 
introducing fake taxa (taxa named *) at nodes containing 
more than two childs. 

e Based on resulting binary taxonomy tree, the encoding 
is done by means of a pair of values (d,c), respectively 
depth and code, which are constructed as follows (see 
figure taxo:coding): 


— dis simply the depth of the node within the tree. For 
example, 0 for root T, 1 for its children, and so on. 
c is set to maximal possible value for root T. 

c is obtained from the code of the parent node by 
changing the d-th bit from the left (so there are only 
2 possibilities, 0 and 1). 

d is set to maximal possible value for L and c is set 
to 0. 


In practice, (d,c) is encoded on the same integer (a 
128-bit integer in our implementation, a 8-bits integer 
in figure 5). If the integer is encoded on 2% bits, then 
we encode d on X bits and c on 2% — X bits. Thus, the 
maximum value for d is dj = 2* — 1 and the maximum 
value for c is cp = 22°-» 1. In figure 5, this gives 
A = 3, dy =7 and cy = Ob11111. 

The operators are then computed by means of simple binary 

operations (match these results with figure 5): 


Computation of /\ : The operator is symmetric and: 
e Ifd=d' and c#C, then (d,c) A (d’,c’) = L. 
e If d <d’ and the first d bits of c and c’ are identical, 
then (d,c) A (d’,c’) = (d’,c’). 
e Ifd < d’ and the first d bits of c and c’ are not identical, 
then (d,c) A (d’,c’) = L. 
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Figure 5. Encoding binary taxonomy on 8 bits; 3 = logs(8) bits for depth; 5 = 8 — 3 bits for code. 


Computation of V : Consider the prefix common to c and 
c'. Note d” the length of this prefix and c” this prefix 
completed with 1’s. Then (d,c) V (d’,c') = (d",c”). 

Case of |: Taxon | is managed specifically. 

At this stage, there is still the problem of fake taxa in regards 

to operator V. The V operation could very well result in a code 

corresponding to a fake taxon. In Figure 5, for example, we 
see that (4,0b11011) V (3,0b11111) = (2,0b11111), which 
is equivalent to writing ABBB\V ABC = x. Of course, this 
result is inadequate, and you have to go up the tree until you 
get a real taxon; in our example, this gives (1,0b11111), ie. 
AB. To this end, our library uses a hash-code dictionary. 


D. A simple distributed and concurrent example 


Our long-term goal is to use the Silx middleware to make 
the library of belief functions interact with a command- 
line console or graphical user interfaces in association with 
domain-specific languages. Only then will our toolbox be truly 
user-oriented. We still have some work to do to achieve this 
goal. In particular, we need to develop a large part of the user 
interaction components. 


Although we cannot demonstrate the functionality of the 
toolbox in its entirety, we can illustrate its flexibility by show- 
ing how the components integrate with the Silx middleware 
through a toy example. 
1) Toy example: We propose a multi-machine processing 
architecture composed of the following processing nodes: 
e a single-machine processing node, machine 0, consisting 
of the following elementary processes: 
— the lattice component that defines the lattice be- 
ing worked on. They transmit the lattice definition 


through dedicated channels to the fuser, writer 
and reader_i components. 

the fuser component that performs the fusion com- 
putations. This component receives the lattice defini- 
tion from component lattice, and vectors of belief 
masses from sources reader_1, reader_2 and 
reader_3. The component produces a result vector, 
which it sends to the writer component. 

the writer component, which receives the fusion 
results from the fuser component and saves them, 
serialized, on the local disk, 

the shutdown_0 component, which manages the 
shutdown of the computing node, 


e three single-machine processing nodes, running on ma- 
chines 7 € {1,2,3}, each with the following elementary 
processes respectively: 

— the reader_i component, which reads and deserial- 
izes a vector of belief functions from the local disk 
of machine 7 and sends it to the fuser component. 
This component obtains the lattice definition from 
component lattice. 

— the shutdown_i component, which manages the 
shutdown of the processing node. 

The multi-machine processing architecture is illustrated in 
figure 6. This example thus includes 10 asynchronous pro- 
cesses and 12 communication channels between processes. 
Of course, this is an excessively rich implementation for 
such a simple problem, but the aim here is to be illustrative. 
These various communication channels can be intra-machine 
or cross-machine (via network sockets). In this figure, these 
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Figure 6. Example: processing graph. 
communication channels are grouped by color: in blue, the 4 ! Listener 
main: 127.0.0.1:8180 


channels carrying the lattice definition; in green, the 3 channels 
carrying the data to be fused; in black, the channel carrying the 
fusion result; in red, the 4 channels triggering the shutdown 
of the processing networks. 


In the medium term, computing networks will be set up using 
graphical user interfaces. At present, the network is entirely 
defined using yaml configuration files [23]. There are four sets 
of configuration files: 

e startup files: these files are defined for each machine (or 
IP address) implemented. 

e local processing network builders: these builders are 
defined for each machine (or IP address) used. 

e builders of processing components: these constructors 
define all component parameters, including input/output 
channels. 

e communication channel definitions. 

All these files are located in appropriate, well-organized di- 
rectories. 


a) Startup files: Startup files are defined differently de- 
pending on whether you are on the master machine or one 
of the slave machines. For a slave machine, the startup file 
indicates that the starter must listen to the master machine. 
The master’s IP address and port are labelled main, while 
those of the slave machine are labelled this, as shown in 
figure 7. On the other hand, the master machine’s start-up file 
contains all the information needed to set up the processing 
networks of the master and slave machines, i.e.: the list of 
configuration files of the local network builders, and the list of 
communication channels (their names and configuration files). 
In this way, the master machine’s starter builds a complete plan 
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this: 127.0.0.1:8181 


Figure 7. Slave 1 startup file: sLave_readerl.yaml. 


of the processing networks, which it transmits to the slave 
machines, so that each machine then deploys the processing 
networks according to the plans. Figure 8 shows one such 
master configuration file. 


b) Local processing network builders: The parameter 
files of local processing network builders simply list all the 
processing components that make them up, i.e. the list of 
their names associated with their respective configuration files. 
Figure 9 shows the configuration file for the master processing 
network. Figure 10 shows the configuration file for machine 
2’s slave processing network: for this application, the slave 
configuration files are essentially identical, apart from the 
indices. 

The communication channels do not appear in this list, but 
the way in which they connect to the processing components 
is specified in the configuration files for these components. 
Please note that, for reasons of simplicity, we have sometimes 
given the same name to the component and to one of its 
incoming or outgoing channels. Nevertheless, channel names 
are capitalized — e.g. Writer, Reader_i, Shutdown_i 
— whereas component names are lower-case — e.g. writer, 
reader_i, shutdown_i. 


c) Component builders: Processing components are pa- 
rameterizable structures that implement computations specific 
to their nature, and a glue that enables them to interact with 
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!Main 
builders: 
127.0.0.1:8180: !unloaded 
path: dsmtbook/ builders /main_builder.yaml 
127.0.0.1:8181: !unloaded 
path: dsmtbook/ builders/slave_readerl_builder.yaml 
127.0.0.1:8182: !unloaded 
path: dsmtbook/ builders/slave_reader2_builder.yaml 
127.0.0.1:8183: !unloaded 
path: dsmtbook/ builders/slave_reader3_builder.yaml 
flow: 
Lattice_0: !unloaded 
path: dsmtbook/channels/channel_latticeO.yaml 
Lattice_1l: !unloaded 
path: dsmtbook/channels/channel_latticel .yaml 
Lattice_2: !unloaded 
path: dsmtbook/channels/channel_lattice2.yaml 
Lattice_3: !unloaded 
path: dsmtbook/channels/channel_lattice3 .yaml 
Reader_1l: !unloaded 
path: dsmtbook/channels/channel_reader! .yaml 
Reader_2: !unloaded 
path: dsmtbook/channels/channel_reader2.yaml 
Reader_3: ! unloaded 
path: dsmtbook/channels/channel_reader3 . yaml 
Shutdown_0: ! unloaded 
path: dsmtbook/channels/channel_shutdown0. yaml 
Shutdown_1: !unloaded 
path: dsmtbook/channels/channel_shutdownl . yaml 
Shutdown_2: !unloaded 
path: dsmtbook/channels/channel_shutdown2. yaml 
Shutdown_3: !unloaded 
path: dsmtbook/channels/channel_shutdown3. yaml 
Writer: !unloaded 
path: dsmtbook/channels/channel_writer.yaml 
main: 127.0.0.1:8180 


Figure 8. Master startup file: main. yaml. 


net_size: 16 
named_servant: 


fuser: !unloaded 

path: dsmtbook/servants/servant_fuser.yaml 
lattice: !unloaded 

path: dsmtbook/servants/servant_lattice.yaml 
shutdown_0: !unloaded 

path: dsmtbook/servants/servant_shutdown0. yaml 
writer: ! unloaded 

path: dsmtbook/servants/servant_writer.yaml 


ctrl_ch_capacity: 16 


Figure 9. Master network builder: main_builder.yaml. 


net_size: 16 
named_servant: 
reader_2: !unloaded 
path: dsmtbook/servants/servant_reader2.yaml 
shutdown_2: !unloaded 
path: dsmtbook/servants/servant_shutdown2. yaml 
ctrl_ch_capacity: 16 


Figure 10. Slave 2 network builder: slave_reader2_builder.yaml. 


the Silx middleware. Implementing this glue is facilitated by 
Silx macros, which will not be detailed here. The design 


of such components is the responsibility of the programmer 


using our library. However, the parameterization of these 
components is the responsibility of the library user. In practice, 


the component parameter file is derived from the serialization 


(in yaml language) of its respective parameterizable structure. 
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Let us now take a closer look at the features of each of the 
components presented here: 


Figure 11 describes the setup file, which determines 
the computation of the fusing component, fuser. 
The internal structure providing this definition is 
DsmtbookFuserBuilder. 


servant: DsmtbookFuserBuilder 
referee: Pcr6 
channel_lattice: 
channels_reader: 
— Reader_1l 

— Reader_2 

— Reader_3 
channel_writer: Writer 
channels_shutdown: 

— Shutdown_0 

— Shutdown_1 

— Shutdown_2 

— Shutdown_3 


Lattice_0 


Figure 11. Component builder: servant_fuser.yaml. 


The referee: Pcré field specifies that the fusion will 
use the PCR6 rule. 

The channel_lattice: Lattice_0 field specifies 
that the Latt ice_0 channel is used to receive the lattice 
definition (from the Lattice component). 


The channel_writer: Writer field specifies that 
the Writer channel will be used to transmit the result 
of the calculation (to the writer component). 


The field prefixed by channels_reader: lists the 
channels — i.e. Reader_1, Reader_2 and Reader_3 
— that the component listens to in order to receive 
input data to be fused from components reader_1, 
reader_2 and reader_3 respectively. 


The field prefixed with channels_shutdown: lists 
the channels to which to send the network shutdown 
signal, once the fusion calculation is complete. 


Figure 12 shows the configuration of the lattice 
component. 

This component contains the taxonomy definition shown 
in figure 13. 

This taxonomy is sent to components fuser, writer, 
reader_1, reader_2 and reader_3 through chan- 
nels Lattice_0, Lattice_1, Lattice 2 and 
Lattice_3. 

Figures 14 and 15 show the parameters of components 
reader_1 and reader_3 respectively. 

The reader_1 component reads input data to be fused 
from the input_1.4son file, deserializing it from the 
json format. The reader_3 component reads input data 
to be fused from the input_3.yaml file, deserializing 
it from the yaml format. 


Components reader_1 and reader_3 receive tax- 
onomy information through channels Lattice_1 
and Lattice_3 respectively. The reader_1 and 
reader_3 components send their input data to 
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servant: DsmtbookLatticeBuilder 
channels_lattice: 
— Lattice_0 
— Lattice_1l 
— Lattice_2 
— Lattice_3 
lattice: !Taxonomy 
taxonomy: !Node 
name: Object 
children: 
— !Node 
name: Air 
children: 
— !Leaf 
name: Airplane 
weight: 0.1 
— !Leaf 
name: UAV 
weight: 0.1 
— !Node 
name: Amphibian 
children: 
— !Leaf 
name: Hovercraft 
weight: 0.05 
— !Node 
name: Ground 
children: 
— !Leaf 
name: Bike 
weight: 0.15 


-— !Leaf 
name: Car 
weight: 0.2 

— !Leaf 


name: Truck 
weight: 0.15 
— !Node 
name: Water 
children: 
— !Leaf 
name: Boat 
weight: 0.15 


— !Leaf 
name: Ship 
weight: 0.1 


Figure 12. Component builder: servant_lattice.yaml. 


the fuser component through the Reader_1 and 
Reader_3 channels respectively. 


The reader_2 component, not shown, is parameterized 
in a similar way to the reader_3 component. 

e Figure 16 shows the parameters of the writer compo- 
nent. 


The writer component receives taxonomy information 
through the Lattice_0O channel, and receives fused 
data from the fuser component through the Writer 
channel. 


The writer component writes the fused data to the 
output.yaml file, serializing it in yaml format. 


d) Channel definitions: Figures 17, 18 and 19 show the 
configuration files for channels Lattice_0, Lattice_1l 
and Shutdown_3 respectively. The various configuration 
files have a common structure. First, the file begins with 
a label that defines the channel type: here !Broadcast 
and !NetBroadcast, which characterize a data broadcast 
within a single machine or between two machines (or IP 


-{ Amphibian | Hovercraft | 


—{ Ground |} Car | 


Boat 
= 
Ship 


Figure 13. Taxonomy encoded in servant_lattice.yaml. 


Air 


servant: DsmtbookReaderBuilder 
channel_reader: Reader_1 
channel_lattice: Lattice_1 
serializer: Json 

file: dsmtbook/data/input_1.json 


Figure 14. Component builder: servant_readerl.yaml. 


servant: DsmtbookReaderBuilder 
channel_reader: Reader_3 
channel_lattice: Lattice_3 
serializer: Yaml 

file: dsmtbook/data/input_3.yaml 


Figure 15. Component builder: servant_reader3.yaml. 


servant: DsmtbookWriterBuilder 
channel_writer: Writer 
channel_lattice: Lattice_0 
serializer: Yaml 

file: dsmtbook/data/output.yaml 


Figure 16. Component builder: servant_writer.yaml. 


! Broadcast 
cluster: 127.0.0.1:8180 
max_ping: 
secs: 0 
nanos: 50000000 
data_type: 59f5bfc8 -6116-0b9c -149b-S50c0e4f645cb 
size: 16 
input: 
— lattice 
output: 
— fuser 
— writer 


Figure 17. Channel: channel_lattice0.yaml. 
addresses). Next, the field preceded by data_type: contains 
a quasi-unique code for the type of data carried by the channel. 


This code is used as a security check when generating the 
processing networks. Typically, this code is the same for 
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!NetBroadcast 
max_ping: 

secs: 0 

nanos: 50000000 
data_type: 59f5bfc8 -6116-0b9c -149b-S50c0e4f645cb 
size: 16 
input: 
—- 127.0.0.1:8180 
-— - lattice 
output: 
—- 127.0.0.1:8181 
— — reader_1l 


Figure 18. Channel: channel_latticel.yaml. 


!NetBroadcast 
max_ping: 

secs: 0 

nanos: 50000000 
data_type: 223a70f4 -30a0-ee42 -21a3-c2365df732c8 
gaze: 16 
input: 
—- 127.0.0.1:8180 
— -— fuser 
output: 
- 127.0.0.1:8183 
— -— shutdown_3 


Figure 19. Channel: channel_shutdown3.yaml. 


channels Lattice_0 and Lattice_1, as they transmit the 
same type of data. However, the code is different for channels 
Lattice_1 and Shutdown_3, as the shutdown signal is of 
a different type than the lattice definition. The fields preceded 
by input: and output: respectively list the components 
transmitting or listening on the channel. Note the difference 
between ! Broadcast and !NetBroadcast channels. For 
the !Broadcast channel, the IP address within which the 
channel is transmitting is indicated in the field preceded by 
cluster:. For the !NetBroadcast channel, there is a 
transmit IP address placed after input: and a receive address 
placed after output:. 


2) Processing results: In this presentation, we have limited 
the input and output data to sequences of two belief masses. 
We denote [m,;, m/] the input data produced by the reader_i 
component and [mg@,m/y] the fused data produced (row by 
row) by the fuser component. In our example, the input 
data is defined by: 


my,(Object) =0.2 m4(Air) = 0.3 my(Lruck) = 0.5 

m4 (Object) =0.3. m{(Ground) = 0.4 m{(Hovercraft) = 0.3 
me2(Object) = 0.1 m2(UAV) =0.5 m2(Amphibian) = 0.4 
m3 (Object) = 0.4 m5(Car) = 0.2 m5 (Water) = 0.3 
mg3(Object) = 0.2 m3(Ground) = 0.2 m3(Bike) = 0.6 
m(Object) =0.4 m(Air) = 0.4 m3(Ship) = 0.2 


We obtain the following fused data, by PCR6 rule as indicated 
in figure 11: 


Me(Object) = 0.06 meg(Air)=0.07 meg(Ground) = 0.04 
me(Truck) = 0.20 m@(UAV) =0.22 meg(Amphibian) = 0.14 
Me (Bike) = 0.27 

My (Object) = 0.20 m 
miy(Water) =0.12 m 
mx(Ship) = 0.08 


>(Air) =0.19 = mi(Ground) = 0.18 
>(Car) =0.10 m(Hovercraft) = 0.13 


Figures 20 and 21 illustrate the contents of the seri- 
alized data of [m,,m‘{] and [m2,m‘] as stored in the 
files input_1l.json and input_2.yaml with json 
and yaml formats respectively. Files input_3.yaml 
and output.yaml are formed in a similar way to 
input_2.yaml. 


a Js 
[ ’ Truck 0.5 ] 
Ils 
[ 
[ "Object O02 | 
[ ”*Ground”, 0.4 ], 
[ ”Hovercraft”, 0.3 ] 


] 


Figure 20. [m1,m/{]: input_1. json. 


- 0.4 


Figure 21. [m2,m]: input_2.yaml. 


E. Perspectives 


Our toolbox is designed to be extended. Of course, global 
extensions can be envisaged, such as the addition of a 
graphical user interface, or interaction with other applications 
connected to the middleware. On the other hand, the belief 
function components are by no means exhaustive, and numer- 
ous completions are foreseeable, though not yet planned. 


V. CONCLUSION 


In this work, we presented the FURTIF toolbox for belief 
function fusion. This toolbox is implemented in Rust, both 
for performance reasons and to open up the possibility of 
integrating it into a distributed, multithreaded and concurrent 
environment. To this end, we have developed the middleware 
Silx in which FURTIF is integrated. It should be noted, 
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however, that the library’s components can be used indepen- 
dently of the middleware. At present, the toolbox is almost 
complete, but lacks a few components that were planned but 
are still under development (e.g. domain-specific language; 
generic referee function; Monte Carlo for computing combi- 
nation rules; entropy, divergence and distance). Although not 
currently planned, the implementation of a graphical user in- 
terface, both for using the fusion library as well as for building 
and deploying computational networks, is a future goal of 
interest. More generally, we intend to use Silx middleware 
to support mixed implementations combining FURTIF with 
other information processing paradigms. 

The FURTIF toolbox will be available in open source at 
[24] and [25]. Also, updated information about this toolbox 
will be regularly posted on [26] for convenience. 

It is worth mentioning that it is possible to compile and link 
one or more rust source files written with the Matlab™ Data 
API for RUST into a binary MEX? file if necessary®. This 
possibility has not yet been tested for interfacing the FURTIF 
toolbox with Matlab™. 


APPENDIX 
A. Iverson bracket 


Although the Iverson bracket [14] is an elegant and concise 
notation, it is not ambiguous. The point to remember is that 
the context of Iverson’s bracket is propositional on the inside 
and numerical on the outside, as in [false] + 1, which has the 
value 0 + 1. Take the following example: 


[[1 + 3] x 5, [[z = 3] A [x < Oj] + 1] 


We understand unambiguously that we have defined the row 
vector [20, 1]. Indeed: 


Purely numerical context: [1 + 3] x 5 is another way of 
writing (1+ 3) x 5 = 20, 

Purely propositional context: [z = 3] A [x < 0] is another 
way of writing (x = 3) A (a < 0), which is false, 
Propositional context inside & numerical context outside: 

[[z = 3] A [x < O]] + 1 is [false] + 1, that is 1, 
Matrix context: [[1+3] x 5, [[v = 3]A [x < 0]|+1) is [20, 1]. 
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Abstract—Ship classification, as an important problem in the 
field of computer vision, has been the focus of research for various 
algorithms over the past few decades. In particular, convolutional 
neural networks (CNN) have become one of the most popular 
models for ship classification tasks, especially in deep learning 
methods. Currently, several classical methods have used single- 
scale features to tackle ship classification, without paying much 
attentions to the impact of multi-scale feature. Therefore, this 
paper proposes a multi-scale feature fusion ship classification 
method based on evidence theory. In this method, multiple scales 
of features are utilized to fuse feature maps of three different 
sizes (40x40x256, 20x20x512, 10x10x1024), which are used to 
perform ship classification tasks separately. Finally, the multi- 
scales-based classification results are treated as pieces of evidence 
and fused at the decision level using evidence theory to obtain 
the final classification result. Experimental results demonstrate 
that compared to classical classification networks, this method 
can effectively improve classification accuracy. 


Keywords: ship classification, multi-scale, evidence theory, 
feature fusion, deep learning. 


I. INTRODUCTION 


Image classification, as an important problem in the field of 
computer vision, aims to assign input images to predefined 
categories. Over the past few decades, significant progress 
has been made in image classification, especially with deep 
learning-based methods. CNN can automatically extract rich 
feature representations from input images and perform classi- 
fication using fully connected layers. Compared to traditional 
machine learning methods, deep learning approaches can learn 
more discriminative features automatically from data, leading 
to higher classification accuracy. Practical applications of 
image classification techniques have become relatively mature 
and have been widely used in various domains, such as 
visual recognition [1], medical image analysis [2], industrial 
quality inspection [3], agriculture [4]—-[6], surveillance [7], and 
autonomous driving [8]. 

However, due to the complex and diverse characteristics of 
image data and the variety of practical application scenarios, 
improving the accuracy of image classification further remains 
a challenging task. For instance, challenges persist in satellite 
remote sensing image classification [9]—[12] and fine-grained 
image classification [13], [14]. 
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For example, ship satellite remote sensing images present 
specific challenges compared to traditional natural images in 
the image classification task [15]-[17]: 


1) Variations in ship size and shape: The appearance and 
shape of ships in satellite remote sensing images can be 
influenced by various factors such as distance, lighting 
conditions, and viewing angles. Therefore, ships of the 
same type may exhibit different sizes and shapes in 
different satellite remote-sensing images, making image 
classification difficult. 

2) Complexity of the background: Ship satellite remote 
sensing images often include complex backgrounds such 
as waves, clouds, and ports. These backgrounds can 
introduce interference in the classification of ships. 

3) Similarity: ship satellite remote sensing images encom- 
pass various types of ships, including different ship types 
and purposes such as cargo ships, passenger ships, and 
fishing boats. However, apart from some specific ship 
types, most ship outlines exhibit an elongated shape 
with axis symmetry and a pointed bow, which can pose 
challenges for classification algorithms. 

4) Resolution: ship satellite remote sensing images typ- 
ically have lower resolution compared to traditional 
natural images. This can impact the extraction of fine- 
grained ship details and features, thus affecting the 
performance of classification algorithms. 

5) Data quality: Ship satellite remote sensing images are 
susceptible to natural factors such as lighting, weather 
conditions, and cloud cover, which can result in lower 
image quality. Issues like blurring, distortion, and occlu- 
sion can arise, affecting the accuracy of ship classifica- 
tion. 


Currently, most existing improvement methods for ship 
classification, which rely on CNN to automatically extract 
abstract features, mainly focus on modifying network 
structures, optimizing training strategies, or redesigning loss 
functions in an iterative manner. However, they overlook 
further processing of the classification results. 
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In the case of fine-grained image classification, which 
is different from general ship classification tasks, the main 
challenge lies in categorizing objects from closely related 
subcategories. These objects often exhibit subtle category 
differences, and the crucial information containing these dif- 
ferences is typically localized in small regions of the image. 
When extracting features using deep neural networks, smaller- 
sized features in the images may get diluted as the network 
deepens, thereby affecting the classification results [18]. Uti- 
lizing multi-scale feature fusion methods allows deep networks 
to learn small-sized features that may have been diluted 
due to network depth, thereby enhancing the accuracy of 
classification. Therefore, solely focusing on network structure 
or loss function improvements may pose challenges in further 
enhancing classification performance. 


In the CNN-based methods, initially, researchers focused 
on deepening the network structure to improve classification 
performance and address issues arising from deeper networks 
in order to enhance the classification network. Later, attention 
shifted towards better feature propagation or utilizing detailed 
features to strengthen classification performance. For example, 
attention mechanisms were introduced to emphasize more 
discriminative features [19], or multiple feature extraction net- 
works were used in combination with extracted feature maps to 
complement missing features [20]. Knowledge distillation was 
also employed to transfer detailed image features to smaller 
primary networks, resulting in improved performance for the 
classification network [21]. However, the approaches above 
added additional complexity to the network structures in order 
to better extract features. 

This paper proposes a multi-scale ship classification net- 
work that applies evidence theory to decision-level fusion to 
break free from the improvement loop mentioned earlier and 
enhance classification accuracy from a different perspective. 
Three main modules are utilized in this method to ensure 
better classification accuracy: (1) Multi-scale output module 
of the feature extraction network; (2) Pyramid feature fusion 
module; (3) Decision-level fusion module based on evidence 
theory. The first two parts focus on improving accuracy using 
network structures, while the final part emphasizes optimizing 
classification performance using the final probability distribu- 
tion information. 

To validate the feasibility of this method, experiments were 
conducted on a traditional natural image dataset and a remote- 
sensing image dataset for fine-grained ship classification. 
Several comparisons were made with classical classification 
methods. The experimental results demonstrate that the pro- 
posed method: E-FPN achieves better classification accuracy 
and consistency compared to classical classification methods. 
The main contributions of this paper are as follows: 


1) To address the issue of information loss during the fea- 
ture extraction process, feature-level fusion is performed 
by selecting feature maps of different depths from the 
backbone feature extraction network. This fusion aims 
to supplement the lost information. 


2) The classification results from multiple scales are further 
fused at the decision level using fusion rules based 
on evidence theory. The different classification results 
are treated as pieces of evidence, and the differences 
in probability distributions are utilized to optimize the 
classification results. 


The remaining sections of this paper are composed as 
follows. Section I provides a review of related works. Section 
III introduces the relevant background knowledge. Section 
IV presents the overall network structure of the E-FPN. 
Section V provides detailed explanations of the experimental 
setup, including parameter settings, experimental procedures, 
and parameter discussions. Finally, in Section VI, the paper 
concludes with a summary and discusses future research 
directions. 


II. RELATED WORK 


At the algorithmic level, deep learning-based image classi- 
fication methods can be divided into two categories based on 
different feature extractors. The first category is CNN-based 
image classification methods, which have achieved remarkable 
breakthroughs in the past decade based on modern deep learn- 
ing techniques. Krizhevsky et al. introduced rectified linear 
units (ReLU) in convolutional neural networks to achieve 
nonlinearity and used the Dropout technique to mitigate over- 
fitting and learn more complex objects [22]. Karen Simonyan 
and Andrew Zisserman improved upon AlexNet by stacking 
3x3 convolutions and deepening the network structure to 
enhance classification accuracy [23]. However, as the networks 
became deeper, issues such as network degradation, vanishing 
gradients, and exploding gradients emerged. To address these 
problems, Kaiming He et al. introduced Batch Normalization 
(BN) to replace Dropout and solve the issues of vanishing 
and exploding gradients. They also introduced residual con- 
nections to address network degradation [24]. SainingXie et al. 
introduced Inception on top of ResNet, transforming single- 
path convolutions into multi-path convolutions with multiple 
branches [25]. Gao Huang et al. proposed DenseNet in 2017, 
which connects each layer with all previous layers in a feed- 
forward fashion to alleviate the vanishing gradient problem 
and enhance feature propagation [26]. Tsung- Yu Lin et al. used 
two feature extractors to extract features from input images 
and combined them using a bilinear pooling function before 
performing classification to compensate for the features lost 
by a single feature extractor (B-CNN) [27]. To fully exploit 
the small features that can differentiate different categories, 
Jianlong Fu et al. proposed RA-CNN, which focuses the 
classification operation on regions with differentiating features 
using a recurrent attention projection mechanism [28]. 

To enhance the classification accuracy of CNN-based classi- 
fication networks on satellite remote sensing images, Linging 
Huang et al. proposed a classification method that converts 
images in the dataset into different color spaces and trains 
separate CNNs on each color space. Finally, the output results 
of each classifier are fused using evidence theory [29]. Yue 
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Chen et al. presented a method called Destruction and Con- 
struction Learning (DCL), which disrupts and shuffles input 
images to emphasize local detailed features. They employ a 
region alignment network to restore the image layout and learn 
semantic information from local regions, thereby strengthening 
the connections between neighboring regions [30]. Heliang 
Zheng et al. introduced a technique that extracts precise atten- 
tion maps to highlight target regions with rich detailed features 
at a high resolution. They also employ knowledge distillation 
to transfer image detail features to the main network for image 
classification [31]. 

The second approach is based on the visual transformer 
method [32]. Similar to CNN, transformers have dominated the 
field of Natural Language Processing (NLP) in the past decade. 
Initially, when transformers were introduced to computer vi- 
sion, they were primarily used to extract global contextual 
information from images, but their performance was not satis- 
factory. In the past two years, there have been breakthroughs 
in using large-scale pretraining on transformer-based CNN 
classification networks, which have surpassed the dominance 
of CNNs in traditional image domains. Examples of such 
networks include Vision Transformer (ViT) [33] and Shifted 
Window Transformer (SWIN-Transformer) [34]. 

In recent years, advancements in ship classification al- 
gorithms have involved various improvement approaches in 
academic research. For instance, Chen et al. employed a 
contrastive learning method to replace classical classification 
techniques. They designed a loss function to separate different 
categories and bring together similar ones [35]. Zhang et al. 
adopted a combination of traditional feature extraction meth- 
ods and modern abstract feature extraction methods to enhance 
the representation capability of ship features [18]. Guo et al. 
utilized shape-aware feature extraction techniques, allowing 
the feature extraction process to better align with the distinc- 
tive spindle-shaped appearance of ships [36]. Building upon 
the bilinear pooling method, Zhang et al. made improvements 
to make it more suitable for ship classification tasks [37]. 
Additionally, Jahan et al. employed knowledge distillation and 
class balancing methods to achieve ship classification in SAR 
ship image [38]. 


II. PRELIMINARIES 
A. Cross Stage Partial Darknet(CSPDarkNet) 


CSPDarkNet [39] can be divided into five main parts: Focus, 
Dark2, Dark3, Dark4, and Dark5, in sequential order. The 
Focus module focuses on aggregating the width and height 
information of the image into the channel information by 
subsampling the image’s pixel values. The structures of Dark2 
to Dark4 are well demonstrated in Figure 1, where each 
Dark part consists of a BaseConv layer and a CSPLayer. 
Each BaseConv layer consists of a convolutional layer, a 
BatchNorm2d layer, and an activation function. The entire 
CSPLayer can be viewed as a residual module, where one 
side of the residual branch passes through the BaseConv layer 
once, while the other side goes through n Bottleneck units 
after the BaseConv layer. The two parts are then concatenated 


and subjected to another BaseConv operation. The structure 
of the Bottleneck unit, as shown in Figure 1, involves a Ix] 
and a 3x3 convolution for the main branch, while the residual 
branch remains unchanged, and the two parts are finally added 
together. The DarkS part is slightly different from the previous 
three parts. It introduces a Spatial Pyramid Pooling (SPP- 
bottleneck) module between the BaseConv and CSPLayer, 
which utilizes max pooling with three different kernel sizes 
to extract features and then combines them to increase the 
network’s receptive field. Its structure is depicted in Figure 2. 
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Figure 1. CSPDarkNet network structure. 
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Figure 2. SPP-Bottleneck network structure. 
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B. Feature Pyramid Networks (FPN) 


In convolutional networks, deep layers are more responsive 
to semantic features, while shallow layers are more responsive 
to image details. In image classification tasks, it has been 
validated by Karen Simonyan and others that deeper networks 
have a positive impact on image classification. However, 
deep convolutional layers tend to lose fine-grained details. 
Therefore, the FPN [40] model can be used to fuse features 
from shallow and deep layers, allowing the deep layers to 
complement the information lost during multiple convolutional 
operations, which is beneficial for subsequent classification 
tasks. The FPN structure is illustrated in Figure 3. 


C. Evidence Theory 


The evidence theory, established by Dempster and Shafer, 
represents propositions using mathematical sets [41]. Unlike 
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Figure 3. FPN network structure. 


probability theory, which considers only single elements, ev- 
idence theory allows for multiple elements within a set. This 
theory is characterized by its ambiguity and the ability to 
perform imprecise reasoning at different levels of abstraction. 
It can differentiate between ignorance and equiprobability, 
enabling better representation of uncertain propositions. Ev- 
idence theory simulates the normal human thinking process, 
where one observes and collects information before synthe- 
sizing it from various aspects to make judgments and obtain 
results for a given problem. 

In the Dempster-Shafer (DS) evidence theory, the sample 
space composed of all propositions is defined as a discernment 
framework, denoted as O. It is a set comprising a group of 
mutually exclusive and collectively exhaustive propositions 
representing all possible answers to a given question. Let’s 
assume another discernment framework defined as 0 = 
{61, 02,---,On}, where 61, 02,---,, represents a set of basic 
hypotheses, and 0,0; =09,i € j i,j =1,2,...,n are subsets 
of it. The power set of © is the set of all its subsets and is 
denoted as 2°. 

Basic Probability Assignment (BPA) refers to the process of 
calculating the basic probabilities for each piece of evidence in 
the discernment framework ©. This process is accomplished 
using the basic probability assignment function, denoted as 
the mass function m(x), which reflects the degree of belief or 
confidence in a proposition. The mass function satisfies the 
following properties: 


m:2° > [0,1], (1) 


m(0) = 0, Dees m(A) = 1. (2) 


In evidence theory, the uncertainty of evidence can be 
quantified using the belief function Bel(A) and the plausibility 


function PI(A). The definitions and the relationship between 
the belief and plausibility functions are as follows: 


Bel(A) = S~ m(B), (3) 
BCA 

PUA) = Do an agg 4B); (4) 

= 1- Bel(A). (5) 


Dempster-Shafer fusion rule of two BPAs is given by 


m(A) = [my © m2](A) 


— 0, A _ 0, (6) 
Hpaona maa) 4 # 0 ) 
with 
K= eee m1(B)m2(C) <1. (7) 


XK represents the conflict coefficient, which can describe 
the magnitude of conflict between items of evidence, a higher 
value of K indicates a greater degree of conflict between 
the evidence. cz serves as a normalization factor. For the 
combination of multiple items of evidence, the calculation 
follows a similar approach. Multiple belief functions can 
be combined using orthogonal sum to generate a new mass 
function, denoted as my ® m2 Om36---@P my. If this 
combination exists, the order of calculation does not affect the 
result, satisfying the commutative and associative properties. 

Suppose there are n sets of evidence £F}, E,..., En, 
with their corresponding basic belief assignment functions 
mM ,™Mg,...,™Mn, and focal elements Aj, Ag,..., An, within 
the given recognition framework. The classical Dempster’s 
combination rule for these sets can be defined as follows: 


NA;=A Jt eee 
m(A) = oe AFD , (8) 
0,A=0 


k= S- II m,;(A;). (9) 


NA;,=0 1<i<n 


The classical Dempster’s combination rule is susceptible 
to paradoxes [42], and there are several classic paradoxical 
situations: 


1) Conflict of evidence: When the basic belief assignment 
functions of multiple evidence sources exhibit strong 
conflicts, the fusion process may lead to highly unrea- 
sonable results and even fail to generate a consistent 
synthesis (complete conflict, i.e., A = 1). 

One-vote veto: If there is a piece of evidence for 
which the basic belief assignment function for a 
specific proposition is 0, the fusion result will be 
0 regardless of the values of other evidence’s be- 
lief assignment functions. This reflects the limita- 
tion of the DS fusion rule in allocating conflict 
properly. For example, assuming there is evidence 
El: mj,(a) = 0.999, m,(b) = 0.001, mi(c) = 0; E2: 
M2(a) = 0, m2(b) = 0.001, m2(c) = 0.999. Using the 


2 


wa 
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formula, calculate the results as m(a) = m(c) = 0, and 
m(b) = 1. Clearly, the results are unreasonable. 

3) Poor Robustness: Although the changes in the basic 
belief assignment values of the focal elements in the 
evidence are minimal, the synthesized results can be 
completely different. For example, modifying the evi- 
dence E1 in the previous example to: m (a) = 0.998, 
my,(b) = 0.001, mi(c) = 0.001, the synthesized result 
shows m(b) = 0.001, contrary to the previous result. 

The Dezert-Smarandache Theory (DSmT) has made im- 

provements to address the aforementioned issues. One of these 
improvements is the Proportional Conflict Redistribution rules 
no.5 (PCRS) [43], which reduces the generation of unreason- 
able results caused by significant conflicts between items of 
evidence compared to the DS fusion method. Additionally, 
weights can be assigned to the outputs of the FPN before 
performing the fusion operation to mitigate conflicts. In the 
PCRS fusion rule, the conflicting degrees are proportionally 
allocated to each focal element, enabling a more reasonable 
fusion of two sources of evidence with high conflicts. The 
PCR5 fusion method for two BPAs is described as follows: 


mrg"(A) = my3"(A) 


xXeE2° 
XNA=0 
my(A)?m(X) 
10 
ai(Aemmy 
where 
m3 (A)= SY > mi(A)ma(B). (11) 


ANB=A 


Among them, m1, and m2 represent the two items of 
evidence; A and B denote the focal elements contained in the 
evidence; m{$"/(A) represents the non-conflicting product, 
and the latter part of the sum represents the allocation of all 
conflicting products containing A on A. 

The weighting calculation method used in the experiment 
referred to the approach proposed by Zhunga Liu et al. [44], 
which adds a weight to the prefused data by calculating the 
difference between two BPA. The mass values corresponding 
to the two classifiers indicate the likelihood of the correspond- 
ing class being true, with higher values suggesting a higher 
probability. The collection of all classes judged as true can be 
represented as follows: 

b, = {Al _mi(A) 


max m,(B) 
BEeQ 


(12) 


Among them, ®; represents the set of true classes. \ denotes 
a threshold set between 0 and 1. When the ratio of the 
mass value corresponding to a class to the maximum mass 
value in that BPA exceeds the threshold, it is considered 
that the class may also be true. This approach increases 
tolerance for differences between the two classification results 
while retaining information that is beneficial for the final 
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classification result. The calculation method for the difference 
between the two BPAs is as follows: 


0, if BN &: ZO, 
K= of Guserru(A) ana ral) if 6; 6, =9. (13) 


AEeQ Ben 
The weight can be represented as follows: 


w=1-K. (14) 


The weights are used to discount the two BPAs using 
Shafer’s discounting operation, aiming to reduce conflicts 
between the two classifiers: 


ee =w-m;,(A),VA€Q, 


m;(Q) =1—w. ) 


By employing the operation of adding weights, it is possible 
to reduce the negative impact of the classifier with lower 
classification ability on the final fusion results when there is 
significant conflict between the two classifiers. This, in turn, 
enhances the accuracy of the ultimate fusion outcome. 


IV. METHODOLGY 


Most existing CNN-based models only utilize features or 
scales from the final stage as the ultimate classification fea- 
tures, making them single-scale classification models. How- 
ever, shallow-level features of the network contain more de- 
tailed information. Neglecting shallow-level features without 
considering them can lead to decreased classification accuracy 
for similar or small objects during the classification process. 
Particularly when the image resolution is low, shallow-level 
features can retain more information and reduce the risk of 
feature loss. To better utilize the features of shallow-level 
networks, this paper proposes a method that uses multi-scale 
features and employs the Feature Pyramid Network (FPN) to 
fuse features from different scales. The fusion of multiple 
classification results is achieved using the fusion rules of 
evidence theory. This approach enables the model to learn 
abstract features at different levels of abstraction on different 
scales, thereby improving the model’s classification accuracy 
and enhancing decision-making capabilities. Consequently, the 
E-FPN consists of three main components: the feature extrac- 
tion network, the FPN feature fusion part, and the decision- 
level fusion based on evidence theory. Specifically, the feature 
extraction part is responsible for extracting abstract features 
from images, the FPN feature fusion part combines features 
from different scales, and the decision-level fusion part, based 
on evidence theory, integrates the classification results from 
multiple scales into the final classification result. Figure 4 
illustrates the overall network structure of the proposed method 
in this paper. The feature extraction part utilizes the backbone 
network structure of YOLOX, using Darknet53 as the main 
network for extracting image features. Darknet53 combines the 
characteristics of ResNet and uses a residual network to ensure 
that the gradient problem caused by excessively deep networks 
is avoided during feature representation. From Figure 4, it 
can be observed that the Dark3, Dark4, and Dark5 parts 
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of the Darknet53 feature extraction network output feature 
maps of three different dimensionalities. These feature maps 
contain features of the objects to be classified at three different 
scales, and all three scales of feature maps are involved 
in the final classification decision step. In other words, by 
utilizing feature maps from different depths of the network 
for multi-scale feature fusion, better feature representation 
capability is ensured. For this feature extraction network, the 
chosen image input size is set to 320320. As the network 
layers increase, the input image dimension transitions from 
320320 to 40x40 for Dark3, 20x20 for Dark4, and 1010 
for Dark5. Considering the depth of the feature extraction 
network, choosing larger or smaller image input sizes would 
result in insufficient feature extraction or feature loss, which 
is not conducive to classification decision-making. In fact, 
selecting an appropriate input size is also consistent with 
mature CNN models, such as VGG and ResNet. 

By extracting features at different stages of the feature ex- 
traction network, different scales of feature maps are obtained, 
capturing information at three different scales. However, di- 
rectly performing classification operations on these feature 
maps is not sufficient. Although the shallow-level feature 
information can propagate to deeper layers in the network, 
it may get diluted during convolutional operations, leading 
to the neglect of detailed information in the resulting deep- 
level features and a decrease in classification accuracy. From 
Figure 4, it can be observed that feature maps from different 
stages or scales participate in the object classification task. 
Therefore, the classification results obtained from feature maps 
at different scales will affect the final classification accuracy. 
It is necessary to enhance the classification accuracy of feature 
maps at different scales involved in the classification task as 
much as possible. To address this issue, this paper introduces 
a multi-scale feature fusion method. This method allows the 
deep-level network to learn detailed feature information from 
the shallow-level network, while the shallow-level network can 
learn abstract feature information from the deep-level network, 
thus improving the feature representation capability. With this 
approach, each scale of the feature map can learn richer 
information, leading to better classification accuracy in the 
subsequent classification process and ultimately improving the 
final classification accuracy. Subsequent experiments demon- 
strate that using the multi-scale feature fusion method can 
improve the accuracy of object classification. It achieves better 
classification results compared to using single-scale feature- 
based classification methods. 

In light of the above, this paper employs Feature Pyramid 
Network (FPN) to perform feature-level fusion of the three 
feature maps obtained from the backbone network, aiming to 
complement the diluted detailed features during the feature 
extraction process. By obtaining three feature maps with the 
same input dimension, classification operations are separately 
performed on two of the feature maps, resulting in two sets 
of classification results. In this paper, the evidence theory is 
used to fuse the classification results from different scales. The 
evidence theory can handle uncertainty and incomplete infor- 


mation by combining multiple pieces of evidence to improve 
classification accuracy. The multi-scale output classification 
results are treated as distinct sources of evidence, which are 
fused at the decision level using evidence theory to obtain 
the final classification result. Specifically, the classification 
results obtained from feature maps of different scales can be 
regarded as different sources of evidence, and the obtained 
classification results can be seen as probability distributions 
where each element represents the probability value of a cor- 
responding class. Therefore, the maximum probability value 
in the obtained probability distribution cannot solely represent 
the current target class, as other higher probability values may 
correspond to the correct class as well. Hence, the obtained 
multiple probability distributions can serve as references from 
different aspects, rather than being definitive classification 
results. The use of the evidence theory enables the integration 
of the probability distributions obtained from different scales 
as different pieces of evidence, and through analyzing the 
differences between these pieces of evidence, a new proba- 
bility distribution is derived as the classification result. This 
classification method resembles the decision-making process 
of human experts, who analyze and study information from 
multiple sources to make an informed judgment, resulting 
in a relatively accurate answer. Subsequent experiments have 
demonstrated that fusing the multi-scale classification results 
using the evidence theory can further improve the accuracy of 
ship classification, validating the effectiveness and applicabil- 
ity of the evidence theory in ship classification. 

In this paper, the input images to the network are set 
to a size of 320x320 in order to retain detailed features 
in the images. Various image augmentation techniques, such 
as random horizontal flipping, occlusion, and cropping, are 
applied to augment the dataset and enhance the network’s 
performance. The input network used is CSPDarkNet, where 
the images are processed through the Focus module to extract 
a value for every other pixel, resulting in four feature maps 
that are then combined together. This process reduces the 
width and height information of the image while increasing the 
number of channels. This reduces the number of parameters 
and improves the network’s performance while minimizing the 
loss of original information. 


2), X [51 (Te) 


Mist 2s: 
Among them, the input image X undergoes a slicing 
operation denoted as X'|-], where every pixel value is extracted 
to obtain four feature maps. The concatenation operation 
concat() is then applied to combine these four feature 
maps. After the Focus operation, the size of the resulting 
feature maps becomes 160~x 160 12. 

After the Focus module, the feature extraction stage follows, 
consisting of Dark2, Dark3, Dark4, and Dark5. The Dark5 
part includes the SPPbottleneck module, which applies pooling 
layers with different kernel sizes to the image to increase 
the network’s receptive field and extract more features. In 
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Figure 4. E-FPN network structure. 


this study, the SPP-Bottleneck module utilizes pooling kernels 
of sizes 5x5, 9x9, and 13x13. The feature maps obtained 
from Dark3, Dark4, and Dark5, denoted as I3, 14, and I5, are 
chosen as the outputs of the feature extraction network. The 
sizes of these feature maps are 40x40 256, 20x20x512, and 
(10x 10x 1024, respectively. Subsequently, these three feature 
maps are fed into the FPN network for feature-level fusion. 
In the fusion stage, the FPN layer takes the three feature 
maps with different dimensions and performs upsampling 
and downsampling operations to integrate the features from 
multiple scales, enriching the information within the feature 
maps at different scales. 


1, = concat(f (Ij), 9(L))), 


fj) = WI; 


(17) 
(18) 


g(1;) = DownSampling(UpSampling(f (J;))). (19) 


In the provided formula, f(J;) represents a convolutional 
operation applied to the feature map, while g(J;) indicates the 
process of upsampling the feature map, followed by fusion 
with a shallow-level feature map, and then downsampling. 
Finally, the resulting feature map is concatenated with the 
feature map processed through the f(J;) operations to ob- 
tain the final feature map used for classification. During the 
upsampling and downsampling process, the combined feature 
map is further integrated using the CSPLayer. This results 
in three feature maps (173,174,175) with the same dimensions 
as the input. Among these, the feature maps corresponding 
to the Dark4 and Dark5 dimensions (I’4,I’5) are selected 
for the classification process. The classification component 
consists of a BaseConv, two convolutional layers, and three 
linear layers. In the linear layers, the flattened feature maps 
are sequentially reduced to dimensions of 256, 64, and 10, 
where the parameter 10 represents the number of classes for 
classification. The softmax() activation function is applied 


to obtain the probability distributions (m, and mz) for the 
output feature maps corresponding to the Dark4 and Dark5 
scales. These probability distributions from the two scales are 
considered as evidence sources for decision-level fusion. 
Zi 
softmax(z;) = Soe" 

c=1 
where e* represents the i-th value, and C’ represents the 
number of outputs, which is the number of classes. 

Although the feature maps between different dimensions 
complement each other with feature information through the 
FPN operation, the probability distribution results obtained 
from different-sized feature maps still exhibit variations after 
classification. Hence, the differences in information between 
these two probability distributions can be utilized to optimize 
the classification results. Treating these two probability distri- 
butions as two sources of evidence, denoted as m, and mg, 
the DS fusion rule is employed to merge them. Initially, the 
conflict coefficient AK, which represents the degree of dissim- 
ilarity between the two pieces of evidence, is computed using 
Equation (7) based on mj, and mg. Subsequently, Equation 
(10) is applied to fuse the probability values of each class in 
my, and mg, resulting in a unique classification result. During 
the fusion process, the probability values corresponding to 
classes with relatively higher degrees of credibility in the 
probability distribution are accentuated, while the probability 
values corresponding to other classes are attenuated. If a 
scenario arises where two probability values in the distribution 
are similar, indicating hesitation between two classes, this 
method can leverage the differential information from other 
probability distributions to make decisions, thereby enhancing 
the reliability of the final classification result. Consequently, 
the final classification result is obtained. This approach pro- 
vides a more reliable classification outcome compared to the 
individual fused results. The pseudocode for the E-FPN is 
outlined in Algorithm 1. 


(20) 
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Algorithm 1: The method processing of an image. 
Input: A ship image X 
begin 

Do abstract feature extraction 
I, = X ® Focus 

Ig = I, ® Darknet2 

I3 = Ig ® Darknet3 

Ig = I3 ® Darknet4 

Is = I4 ® Darknet5 

End 

Do FPN feature fusion 

1,15 = frpw (Js, 14, Is) 
End 

Do Classification 

Ij = flatten(I) 

I, = flatten(I, 

m= softmax{FC(L, )} 
m2 = softmar{FC(I; )} 
End 

Do Decision fusion 

result = PCR5(m 1, m2) 
End 

Output: Classification tensor result. 


In this case, 11, 2, 13, 14,15 represent the outputs of each 
stage in the backbone network, DarknetN, and N © 2,3,4,5 
represents different parts of the backbone network, frpn(-) 
refers to the feature fusion operation, K represents the conflict 
coefficient between evidence. flatten(-) denotes the oper- 
ation of flattening the feature map, F'C(-) represents the 
classification operation, and softmax{-} maps the obtained 
classification results to the range [0,1]. 


V. EXPERIMENTAL 
A. Dataset 


In this section, to validate the effectiveness of the E-FPN 
and compare it with other image classification algorithms, two 
datasets, CIFAR-10 and FGSCR10, were used. 

CIFAR-10 is a small-scale dataset used for general object 
recognition. It consists of 10 classes of RGB images, with 
6,000 images per class. The dataset is divided into a training 
set of 50,000 images and a test set of 10,000 images. The 
images have a size of 32x32 pixels. This dataset is used to 
evaluate the classification performance of traditional natural 
images. 

The FGSCR-42 dataset is a publicly available dataset for 
fine-grained ship classification in remote sensing images. It 
contains 42 classes with a total of 9,320 images, and the 
images have varying resolutions. For the experiments in this 
section, we selected 10 classes with a larger number of 
image samples, resulting in a total of 5,220 images. This 
dataset is used to evaluate the classification performance in 
the context of remote sensing images and fine-grained object 
classification. The composition and sample images are shown 
in Table I and Figure 5, respectively. 


B. Experimental Parameter Settings 


In this study, we will compare classic classification algo- 
rithms, namely ResNet50, ResNeXt50, VGG19, and VGG16, 


Table I 
SHIP IMAGE CATEGORY. 
Category Train | Test 
Arleigh_Burke-class_destroyer 290 290 
Cargo_ship 189 189 
Civil_yacht 389 388 
Container_ship 228 227 


Medical_ship 161 161 
Nimitz-class_aircraft_carrier 277 276 
San_Antonio-class_transport_dock 160 159 


Ticonderoga-class_cruiser 304 303 
Towing_vessel 389 389 
Wasp-class_assault_ship 227 226 


along with the fine-grained image classification algorithms 
B-CNN and DCL, against E-FPN to evaluate its effective- 
ness. For the classic classification algorithms, the image 
size was uniformly adjusted to 224x224. Data augmentation 
techniques, including random horizontal flipping, random oc- 
clusion, and random cropping, were applied to the dataset 
images. The initial learning rate was set to 0.0001, and the 
training batch size, weight decay, and decay epoch were set 
to 64, 0.1, and 50, respectively. The Adam optimizer was 
selected, and the cross entropy loss function was employed 
for calculating the loss. In the proposed method, to preserve 
more image feature information, the dataset images were 
uniformly resized to 320x320 while keeping the remaining 
parameters consistent with the aforementioned settings. This 
was done to evaluate the effectiveness of E-FPN in terms of 
classification performance, by comparing it with the baseline 
models. Further details regarding the metrics and evaluation 
will be presented in the following sections. The experiments 
were conducted using the GPU resource A5000-24G. 


C. Evaluation Indices 


In this experiment, overall accuracy (OA) and the Kappa 
statistic were employed as evaluation metrics to assess the 
classification performance of the models. The details are as 
follows: 

1) OA: Overall Accuracy is defined as the ratio of correctly 
classified samples to the total number of samples. The 
calculation method is as follows: 

1x 
OA= =) Sf; (21) 
uv 

where N represents the total number of image samples 

in the dataset. f(z) represents whether the classification 

of the 7-th sample is correct. If the classification is 

correct, the value of f (i) is 1; otherwise, it is 0. 

2) The Kappa coefficient is used for consistency testing 
and can also be used to measure classification accuracy. 
Its calculation is based on the confusion matrix. The 
calculation method is as follows: 
bee Po — Pe 

1-— Pe 
where po represents the ratio of the sum of correctly 
classified samples in each class to the total number 


(22) 
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Figure 5. FGSCR-10 Image Examples. 


of samples, which corresponds to the overall accuracy. 
Assuming the true number of samples in each class 
is denoted as a1, Gg, ..., Gc, the predicted number of 
samples in each class is denoted as 1, bg, ..., be, and 
the total number of samples is n, then the equation can 
be expressed as follows: 


ay X by + ag X bo +... + de X de 
nxn 


De = (23) 
The calculation result of Kappa falls between[—1, 1], but it 
typically ranges [0,1]. It can be categorized into five levels 
to represent different levels of agreement: [0.0,0.20] slight 
agreement, [0.21,0.40] fair agreement, [0.41,0.60] moderate 
agreement, [0.61,0.80] substantial agreement, and [0.81, 1] 
almost perfect agreement. 


D. Performance Evaluation 


In this section, the effectiveness of the proposed method is 
evaluated by comparing it with classical image classification 
networks on the CIFAR-10 and FGSCR-10 datasets. The 
validation results are shown in Table II and Table HI. Where 
bold typeface represents the best while underlining represents 
the second-best. 


Table II 
COMPARED WITH CLASSICAL NETWORK OA. 
Method FGSCR-10 | CIFAR-10 
Resnet50 0.9677 0.9320 
Resnext50 0.9631 0.9319 
VGG16 0.9685 0.9330 
VGG19 0.9405 0.9451 
B-CNN 0.9663 0.9242 
DCL 0.9731 0.9504 
E-FPN 0.9804 0.9478 


In Tables II and III, two metrics are used to evaluate the clas- 
sification performance, compare four classical classification 


Table III 
COMPARED WITH CLASSICAL NETWORK KAPPA. 
Method FGSCR-10 | CIFAR-10 
Resnet50 0.9638 0.9220 
Resnext50 0.9681 0.9327 
VGG16 0.9573 0.9424 
VGG19 0.9336 0.9390 
B-CNN 0.9621 0.9157 
DCL 0.9693 0.9449 
E-FPN 0.9776 0.9450 


networks and two fine-grained image classification networks 
with E-FPN. The proposed method is evaluated on the CIFAR- 
10 dataset using two metrics, OA and Kappa. The results 
indicate that E-FPN achieved excellent performance in both 
metrics, with an OA of 94.78% and a Kappa value of 0.945, 
obtaining the second-best and best scores, respectively. This 
demonstrates the effectiveness of the E-FPN in the traditional 
natural image dataset. 

In the FGSCR-10 dataset, the proposed method achieved 
an OA of 98.04% and a Kappa value of 0.9776. Compared 
to the other four classical methods, the E-FPN showed an 
improvement in OA ranging from 1.15% to 3.95% and an 
improvement in the Kappa metric ranging from 0.0095 to 
0.044. When compared with the other two fine-grained im- 
age classification algorithms, E-FPN also achieved excellent 
results, with the highest OA and Kappa values. 

Through the experiments on the two datasets, it can be 
observed that all algorithms showed similar performance on 
the CIFAR-10 dataset, and in some cases, B-CNN even 
exhibited lower accuracy compared to the baseline model. This 
could be attributed to the low resolution of the images in this 
dataset, as certain algorithmic improvements may not perform 
as effectively under such conditions. 


In the FGSCR-10 dataset, the performance of the proposed 
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method surpassed that of the other four baseline models. This 
may be due to the fact that the FGSCR-10 dataset involves 
fine-grained classification targets. After feature extraction by 
the backbone network, the E-FPN utilizes the FPN method to 
fuse features at different scales, which allows for complemen- 
tary details among the three-dimensional feature maps. Finally, 
the classification results of different feature maps are fused 
using the evidence theory-based decision-level fusion method, 
further correcting the classification results. For example, when 
an image is misclassified, its correct classification has a 
probability value that is close to the probability value of 
the current misclassification. When another set of probability 
distributions is fused, the probability value corresponding 
to the correct classification is also large. After fusion, the 
probability value of the correct classification may become the 
largest, resulting in the final correct result. As a result, the 
proposed method demonstrates an advantage over the other 
methods in the FGSCR-10 dataset. Compared to the other two 
fine-grained image classification algorithms, our proposed E- 
FPN outperforms B-CNN and DCL. This may be attributed to 
the effective extraction of object’s fine-grained features using 
our multi-scale approach, and the decision-level fusion enables 
comprehensive analysis of classification results from different 
perspectives. 

In terms of the Kappa metric, all classification methods 
achieved a performance exceeding 90% on both datasets, 
indicating a level of consistency considered ’almost perfect.” 
Compared to the other four baseline models, E-FPN exhib- 
ited further improvement in this metric, signifying enhanced 
classification accuracy for each class and its general ap- 
plicability. Additionally, when compared to the fine-grained 
image recognition algorithms (B-CNN and DCL), E-FPN also 
shows improvement in terms of Kappa. Furthermore, Figure 6 
provides a detailed visualization of the classification results for 
each class, demonstrating the proposed method’s performance 
in terms of confusion matrices on both the CIFAR-10 and 
FGSCR-10 datasets. There are very few dark areas outside 
the diagonal, indicating a reduced number of misclassifica- 
tions. This visual representation intuitively demonstrates the 
effectiveness of E-FPN. 


Table IV presents the number of parameters, FLOPs 
(floating-point operations), and inference time for the seven 
models. 


Table IV 
COMPARISON OF THE NUMBER OF PARAMETERS AND FLOPS. 


Method Params(M) | FLOPs(G) | Inference time(ms) 
Resnet50 23.53 4.13 35.854 
Resnext50 23 3.82 35.418 
VGGI16 134.33 15.52 33.658 
VGG19 139.62 19.96 34.574 
B-CNN 17.34 61.93 49.243 
DCL 23.57 16.53 48.165 
E-FPN 79.22 3.58 45.224 


It can be observed that VGG16 and VGG19 have signif- 
icantly higher numbers of parameters and FLOPs compared 


to the other baseline models. This is likely due to their 
deeper network architectures and the utilization of numer- 
ous convolutional layers. On the other hand, ResNet50 and 
ResNeXt50 have smaller numbers of parameters and FLOPs. 
This reduction can be attributed to the utilization of residual 
structures, which help reduce network depth and complex- 
ity. Among the five methods, E-FPN has a higher number 
of parameters compared to ResNet50 and ResNeXt50 but 
lower than VGG16 and VGG19. However, its FLOPs are 
the lowest among the five methods, indicating relatively low 
computational cost when performing the classification task. 
This is because the proposed method introduces an additional 
FPN network while the backbone network adopts the residual 
approach to reduce its depth. Comparing the fine-grained 
image classification models, E-FPN has the highest number of 
parameters, suggesting higher storage requirements. However, 
its FLOPs remain the lowest, indicating that, compared to 
the other six models, E-FPN requires fewer computational 
resources during the inference phase, making it suitable for 
deployment on mobile and edge devices. This observation is 
evident from the inference speed, where all three fine-grained 
image recognition models, including E-FPN, require higher 
inference time than the four baseline models. However, in 
fine-grained image recognition models, the inference time of 
the E-FPN model is lower than the other two (B-CNN and 
DCL). This demonstrates the advantage of E-FPN in terms of 
inference speed. 

Additionally, the DS fusion method used in the E-FPN 
incurs minimal additional computational cost for the network. 
As a result, the increase in network parameters is relatively 
small, and the FLOPs are the lowest among the all models. 

By comparing the experimental results from the two afore- 
mentioned tables, it can be concluded that the E-FPN is 
effective on both traditional natural image datasets and fine- 
grained remote sensing image datasets. In the description of 
the FPN network structure, it was mentioned that three feature 
maps of different dimensions were utilized, but during the 
final decision-level fusion, only the results from the deeper 
two scales of feature maps were selected for fusion. In the 
following, we will discuss the impact of choosing different 
dimension feature maps for decision-level fusion on the final 
results. The results of these experiments are presented! in 
Table V and Table VI. 


Table V 
FUSION OF RESULTS FROM DIFFERENT SCALES IN CIFAR-10 DATASET. 

Dataset Dark3 | Dark4 | Dark5S OA Kappa 

v x x 0.9374 | 0.9304 

x v x 0.9438 | 0.9375 

x x v 0.9516 | 0.9462 

CIFAR-10 v v x 0.9431 | 0.9367 

x v v 0.9492 0.945 

v x v 0.948 0.9422 

v v v 0.9478 0.942 


'The symbol “/” indicates the usage of feature maps at that scale, while 
“x” indicates their exclusion. 
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True label 


Predicted label 


(a)The confusion matrix on 
the CIFAR-10 dataset. 


True label 


Predicted label 


(b)The confusion matrix on 
the FGSCR-10 dataset. 


Figure 6. The confusion matrices of E-FPN on CIFAR-10 and FGSCR-10 are presented. (a) shows the confusion matrix obtained on the CIFAR-10 dataset, 


while (b) shows the confusion matrix obtained on the FGSCR-10 dataset. 


Table VI 
FUSION OF RESULTS FROM DIFFERENT SCALES IN FGSCR-10 DATASET. 
Dataset Dark3 | Dark4 | Dark5 OA Kappa 
v x x 0.9773 | 0.9746 
x v x 0.9773 | 0.9746 
x x v 0.9773 | 0.9746 
FGSCR-10 v v x 0.978 0.975 
x v v 0.9804 0.978 
v x v 0.9804 | 0.9478 
v v v 0.9804 0.978 


In this experiment, different combinations of feature maps 
were fused for each dataset, and the impact of pairwise fusion 
of different feature maps on the final results was compared. 
The last line represents the results obtained by fusing all three 
feature maps together. Dark3, Dark4, and Dark5 represent the 
probability distributions of the classification results from the 
FPN fused outputs of the backbone network. In Table V, 
the CIFAR-10 dataset was used. It can be observed that 
before decision-level fusion, the OA gradually improves as 
the network layers deepen. However, after fusion, the OA is 
lower than the OA of the Dark5 output result. Among the fused 
results, the fusion of Dark4 and Dark5 achieves the highest OA 
of 94.92%. Furthermore, its Kappa value is superior to the 
other three results, being 0.945. Preliminary analysis suggests 
that this may be due to significant conflicts in the probability 
values among certain categories before fusion, resulting in an 
unreasonable probability distribution after fusion, thus leading 
to incorrect fusion results. Further investigation of this issue 
will be discussed in subsequent sections. 


Table VI displays the results obtained from the FGSCR-10 
dataset, Dark3, Dark4, and Dark5 have the same classification 
OA of 97.73%. However, the OA improves after fusion. The 
fusion of Dark3 and Dark4 achieves an OA of 97.8%, while 
the fusions of Dark4 and Dark5, Dark3 and Dark5, and all 
three (Dark3, Dark4, and Dark5) have an OA of 98.04%. The 
performance of the Kappa index is consistent with the OA, 
with the fusion of Dark3 and Dark4 resulting in a Kappa 
value of 0.975, while the other three fusions all have a Kappa 
value of 0.978. By comparing the results before and after 
fusion, it can be observed that the samples correctly classified 
by Dark3, Dark4, and Dark5 are not entirely the same, and 
in the probability distributions of misclassified samples, the 
probability values for the correct class are close to those of the 
misclassified class. Therefore, after fusion, some misclassified 
samples are corrected, resulting in an improvement in the final 
OA of the results. 

According to Table VI, it can be observed that the highest 
OA results after fusion are obtained by combining Dark5 
with other parts, and these results are superior to the results 
obtained by fusing Dark3 and Dark4. It can be seen that 
the results obtained from the deeper parts of the network 
have a more reliable probability distribution. However, the 
results obtained by fusing all three parts together show a slight 
decrease compared to the fusion of Dark4 and Dark5. This 
may be due to the fact that during fusion, the probability 
values of the correct class and the misclassified class for all 
three inputs are very close, and since Dark3 has classification 
errors, the final result was not corrected to the correct class 
during fusion, resulting in a decrease in OA. Therefore, in the 
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experiment, this study chooses to fuse Dark4 and Dark5 for 
the fusion process. 

The E-FPN in this paper consists of three parts: the feature 
extraction network, the FPN network, and the decision fusion 
part. During the training process, the crossentropy loss values 
of the three outputs from FPN are summed to calculate the 
overall loss value. Specifically, the obtained loss values in the 
network are referred to as loss_0, loss_1, and loss_2. However, 
for the final decision, only the output results from Dark4 and 
Dark5 are selected for fusion. Therefore, the next step is to 
explore the impact of the loss_O value obtained from Dark2 
on the classification performance and the effect of using FPN 
for fusion at the feature level. 

According to the experimental results in Table VII, on the 
CIFAR-10 dataset, the removal of loss_0 slightly improves the 
OA to 95.13%. This may be because FPN has multiple output 
classification results, and adding loss_O during the training 
process may lead to oscillation and decision risk. Additionally, 
the OA gap between Dark4 and Dark5S is very small, and their 
Kappa values are similar. Without using FPN for feature-level 
fusion, the OA is lower in both cases, and the fused OA and 
Kappa values are also lower compared to the two cases with 
FPN. This indicates that the classification results of the shallow 
layers may have a negative impact on decision-making and 
fusion in datasets with clear image features. 


Table VII 
THE ABLATION EXPERIMENTS OF E-FPN ON THE CIFAR-10 
DATASET.(DARK3, DARK4, AND DARKS REPRESENT THE CLASSIFICATION 
OA ON THREE DIFFERENT SCALES.) 


CIFAR-10 DarkNet DarkNet | DarkNet 
+FPN+loss_0 +FPN +loss_0 
Dark3 0.9374 0.8865 
Dark4 0.9438 0.9401 0.9344 
Dark5 0.9516 0.9511 0.9428 
E-FPN OA 0.9492 0.9513 0.944 
E-FPN Kappa 0.945 0.945 0.9377 


However, the positive impact of the shallow layers in 
the feature-level fusion should not be ignored, as shown in 
Table VIII. 


Table VIII 
THE ABLATION EXPERIMENTS OF E-FPN ON THE FGSCR-10 
DATASET.(DARK3, DARK4, AND DARKS REPRESENT THE CLASSIFICATION 
OA ON THREE DIFFERENT SCALES.) 


FGSCR-10 DarkNet DarkNet | DarkNet 
+FPN+loss_0 +FPN +loss_0 
Dark3 0.9773 0.9605 
Dark4 0.9773 0.9743 0.972 
Dark5 0.9773 0.9754 0.9735 
E-FPN OA 0.9804 0.975 0.9781 
E-FPN Kappa 0.978 0.972 0.9754 


When conducting experiments on the FGSCR-10 dataset, 
it was found that adding loss_0 and using FPN for feature- 
level fusion resulted in higher OA and Kappa values compared 
to not using FPN or not adding loss_0, achieving 98.04% 


and 0.978, respectively. This indicates that the classification 
performance for each class object in the dataset is excellent. 
Under the conditions of removing FPN and removing loss_0, 
the OA gap between Dark4, Dark5, and the fused result 
is small. However, it can be observed that the OA of the 
fused result is better than the individual results. As mentioned 
earlier, although the outputs of the shallow network can have a 
negative impact on the final decision-level fusion, the features 
learned by the shallow network still have a positive influence 
on the classification results in the feature-level fusion process. 

Based on the experiments and discussions, it can be con- 
cluded that using the FPN structure and training the shallow 
network for classification improves the classification perfor- 
mance on the fine-grained remote sensing image dataset. 
The FPN structure complements the detailed features lost 
in the deep network. Since the FPN structure used in the 
paper involves the fusion of information from three layers, 
adding loss_O for classification training in the top layer of 
the network can facilitate the learning of more useful feature 
information, further enhancing the feature fusion effect. The 
results in Table VIII indicate that employing the feature-level 
fusion method helps improve the classification performance of 
fine-grained remote sensing image classification and further 
enhances the classification performance after decision-level 
fusion. 

The experimental parameter section in this paper mentions 
that, unlike the four other classification methods used in the 
comparative experiments, the image input size for the E-FPN 
in this paper is 320x320, while the four classical classification 
methods use an image input size of 224x224. The purpose 
of this choice is to preserve more image feature information. 
However, it should be noted that a larger input image size 
does not necessarily guarantee better performance. Tables IX 
and X present a comparison of the impact of different input 
image sizes on the classification performance. 


Table Ix 
INPUT IMAGES OF DIFFERENT SIZES IN CIFAR-10.(DARK3, DARK4, AND 


DARKS REPRESENT THE CLASSIFICATION OA ON THREE DIFFERENT 
SCALES.) 
CIFAR-10 640x640 | 320x320 | 224x224 | 160x160 
Dark3 0.9161 0.9374 0.9171 0.9012 
Dark4 0.9164 0.9438 0.9277 0.9123 
Dark5 0.9242 0.9516 0.9341 0.9169 
E-FPN OA 0.9248 0.9492 0.9348 0.9174 
E-FPN Kappa 0.9164 0.945 0.9275 0.9082 


In the experiments comparing the impact of different input 
image sizes on classification performance, the image size of 
224x224, which is the same as the other four classical algo- 
rithms, was selected. Additionally, scaled versions of 640x640 
and 160 160 were used. Based on the data in Tables IX and X, 
it was found that the input image size of 320x320 achieved 
the best performance in terms of classification OA and Kappa 
value. Furthermore, in both tables, as the input image size 
decreased from large to small, the classification OA initially 
increased and then decreased. Therefore, it is not necessarily 
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Table X 
INPUT IMAGES OF DIFFERENT SIZES IN FGSCR-10.(DARK3, DARK4, AND 
DARKS REPRESENT THE CLASSIFICATION OA ON THREE DIFFERENT 


SCALES.) 
FGSCR-10 640x640 | 320x320 | 224x224 | 160x160 
Dark3 0.9758 0.9773 0.9551 0.9677 
Dark4 0.9746 0.9773 0.9605 0.9674 
Dark5 0.9654 0.9773 0.9635 0.9616 
E-FPN OA 0.9693 0.9804 0.9628 0.9658 
E-FPN Kappa 0.9655 0.978 0.9582 0.9616 


true that a larger input image size leads to better performance, 
and the appropriate size should be chosen based on the specific 
circumstances. 

Table IX presents the influence of different input image sizes 
on classification performance, revealing that the classification 
OA for each size increases with the depth of the network. 
Except for the 320320 size group, all other size groups 
exhibit an increase in classification OA after decision-level 
fusion. However, the final accuracy remains lower than that 
of the 320x320 size group. Regarding the Kappa index, the 
320x320 size group still performs the best. These data indicate 
that for traditional natural image datasets, which have easily 
discernible image features, adequate feature extraction enables 
effective classification, necessitating only the selection of an 
appropriate input image size. 

Table X demonstrates the impact of different input image 
sizes on classification performance in the FGSCR-10 dataset. 
In contrast to Table IX, Table X does not observe an increase 
in classification OA with network depth. In the 640x640 
and 160x160 size groups, a decline in classification OA is 
observed as network depth increases. This may be due to 
excessively large or small feature maps that fail to effectively 
propagate relevant features in FPN feature fusion. For the 
160 x 160 size group, the small image size may lead to the loss 
of crucial detail features, resulting in reduced classification 
OA. This could also result in significant conflicts between 
the generated probability distributions, making it difficult 
to correct misclassifications during decision-level fusion and 
ultimately decreasing the OA of the fused results. In the 
320x320 size group, Dark3, Dark4, and Dark5 exhibit higher 
classification OA than the other groups. Although these three 
groups have the same classification accuracy, decision-level 
fusion further enhances their OA. These data demonstrate 
that the proposed classification method, when applied to fine- 
grained remote sensing image datasets, benefits from using 
appropriately sized input images. This enables the extraction 
of abstract features while retaining some detailed features, 
facilitating subsequent image classification operations. 

In the previous sections, we discussed the network archi- 
tecture and input image data. In Section II, the limitations of 
the DS fusion method were mentioned, specifically, the issue 
of unreasonable fusion results when significant conflicts exist 
between two input evidence. To overcome this problem, this 
paper adopts the PCRS fusion method and utilizes the Shafer 
discounting method to weigh the evidence, reducing conflicts 


between input evidence. The obtained results are compared 
with those of the DS fusion method. 

Table XI and Table XII present the OA and Kappa values 
obtained using three different fusion rules on the CIFAR-10 
dataset. The DS fusion rule is the fusion rule adopted in this 
paper, PCRS is the proportional conflict redistribution method 
mentioned in Section II of this paper, and wPCRS refers to the 
addition of weights to the probability distributions before using 
PCRS fusion rule, applying the Shafer discounting method 
to discount the evidence and reduce conflicts between input 
data. From Table XI, it can be observed that the OA of 
Dark3 x Dark4, Dark3 xDark5, and Dark4xDark5 combina- 
tions under the DS fusion rule and PCRS fusion rule is almost 
indistinguishable. However, for the Dark4*Dark5 combination, 
the OA decreases when using the PCR5 rule compared to 
the DS rule. After applying the wPCR5 fusion rule, the OA 
improves compared to both the DS rule and the PCRS rule for 
all three combinations. This improvement may be attributed to 
the already high classification OA before fusion, indicating a 
relatively small conflict between the probability distributions 
of the two input data. The PCRS fusion rule primarily aims to 
mitigate the impact of conflicts on fusion results and prevent 
the generation of unreasonable output values. By adding 
weights and employing the PCR5 rule, conflicts between the 
two inputs can be further effectively reduced, leading to better 
results. The Kappa values generally exhibit a similar pattern to 
the OA results. The wPCRS rule yields slightly better results 
compared to the DS and PCRS rules, but the improvement is 
marginal, while there is little difference between the DS rule 
and the PCRS rule. 


Table XI 
PRECISION COMPARISON OF DIFFERENT DECISION - LEVEL FUSION 
METHODS IN CIFAR-10. 


CIFAR-10 OA Dark3 x Dark4 | Dark3xDark5 | Dark4xDark5 
E-FPN with DS 0.9431 0.948 0.9492 
E-FPN with PCR5 0.9432 0.9479 0.9489 
E-FPN with wPCR5 0.9442 0.9508 0.9509 
Table XII 
KAPPA COMPARISON OF DIFFERENT DECISION - LEVEL FUSION METHODS 
IN CIFAR-10. 
CIFAR-10 Kappa Dark3xDark4 | Dark3xDark5 | Dark4xDark5 
E-FPN with DS 0.9367 0.9422 0.945 
E-FPN with PCR5 0.9368 0.9421 0.9432 
E-FPN with wPCR5 0.938 0.945 0.945 


Table XIII and Table XIV compare the OA and Kappa 
values on the FGSCR-10 dataset. Similar to the results ob- 
tained on the CIFAR-10 dataset, the DS fusion rule and 
the PCR5 fusion rule yield nearly identical results. However, 
for the Dark3 xDark5 combination with higher conflicts, the 
PCRS rule slightly outperforms the DS rule. When using 
the wPCR5 rule, the performance is slightly worse than 
the previous two rules. The same trend is observed in the 
Kappa values. However, in the case of fine-grained remote 
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sensing image datasets, the probability values of each class 
in the classification distributions are close, making it difficult 
to compute favorable weights, as mentioned in Section II. 
Consequently, the weighting approach weakens the confidence 
of certain correctly classified classes during discounting op- 
erations, resulting in suboptimal final results. Regarding the 
Kappa values, there is little difference among the three fusion 
methods. 


Table XIII 
PRECISION COMPARISON OF DIFFERENT DECISION - LEVEL FUSION 
METHODS IN FGSCR-10. 


FGSCR-10 OA Dark3 x Dark4 | Dark3xDark5 | Dark4xDark5 
E-FPN with DS 0.978 0.9804 0.9804 
E-FPN with PCR5 0.978 0.9812 0.9804 
E-FPN with wPCR5 0.977 0.9796 0.98 
Table XIV 
KAPPA COMPARISON OF DIFFERENT DECISION - LEVEL FUSION METHODS 
IN FGSCR-10. 

FGSCR-10 Kappa Dark3 x Dark4 | Dark3xDark5 | Dark4 x Dark5 
E-FPN with DS 0.975 0.978 0.978 
E-FPN with PCR5 0.976 0.978 0.978 
E-FPN with wPCR5 0.975 0.977 0.9776 


Based on the analysis above, it can be observed that the DS 
fusion rule and the PCR5 fusion rule yield almost identical 
results on both datasets. The wPCR5 method performs slightly 
better than the previous two methods on traditional natural 
image datasets but slightly worse on fine-grained remote sens- 
ing image datasets. Additionally, the computation complexity 
of PCRS and wPCRS5 is higher than that of the DS rule, 
and the complexity increases more noticeably with a larger 
number of classes to be classified. Therefore, when there is no 
significant conflict between the two probability distributions, 
the DS fusion rule is chosen in this paper. 

In the previous experiments, it was mentioned that in the 
fine-grained remote sensing image dataset, the method of 
adding weights to reduce conflicts between evidence actually 
weakened the credibility of some correctly classified results. 
In the process of calculating the weights, a threshold was 
set for the ratio between the mass values of each class and 
the maximum mass value to preserve the differences between 
the two classification results. In the previous experiments, a 
threshold of 0.5 was set. The impact of the threshold value on 
the OA and Kappa value after fusion can be seen in Table XV 
and Table XVI, where the value \ represents the threshold 
chosen for calculating weights. 

The Tables XV and XVI demonstrates the influence of 
threshold values ranging from 0.1 to 0.9 on the classifica- 
tion OA and consistency in two datasets. In the CIFAR-10 
dataset, as the threshold value increases from 0.1 to 0.9, the 
classification OA gradually rises to 95.17%. Compared to the 
threshold value of 0.1, there is an improvement of 0.24%. The 
Kappa value increases from 0.9436 to 0.9463. In this dataset, 
when the threshold value increases, it filters out categories 


Table XV 
COMPARISON OF DIFFERENT THRESHOLDS UNDER CIFAR-10. 
Xr OA Kappa 
0.1 | 0.9493 | 0.9436 
0.2 | 0.9497 | 0.9441 
0.3 0.95 0.9444 
0.4 | 0.9501 | 0.9445 
0.5 | 0.9509 0.945 
0.6 | 0.9516 | 0.9462 
0.7 | 0.9516 | 0.9462 
0.8 | 0.9517 | 0.9463 
0.9 | 0.9517 | 0.9463 
Table XVI 
COMPARISON OF DIFFERENT THRESHOLDS UNDER FGSCR- 10. 
N OA Kappa 
0.1 0.98 0.9776 
0.2 | 0.9796 | 0.9771 
0.3 | 0.9796 | 0.9771 
0.4 | 0.9796 | 0.9771 
0.5 0.98 0.9776 
0.6 0.98 0.9776 
0.7 0.98 0.9776 
0.8 0.98 0.9776 
0.9 | 0.9796 | 0.9771 


with lower probability values in the probability distributions, 
retaining other potential options for correct classification. This 
preserves some differences between classifiers as complemen- 
tary information, which benefits subsequent fusion operations. 

In the FGSCR- 10 dataset, changing the threshold value from 
0.1 to 0.9 has almost no impact on the classification OA and 
Kappa values. This indicates that the threshold value has little 
effect on the fusion results in this dataset. Table XVII displays 
partial probability distributions generated by DarkS. It can be 
observed that the reason for this phenomenon is that one class 
in the probability distribution before fusion has a significantly 
high probability value, and the ratios of other probabilities to 
it might be lower than 0.1. Consequently, the variation of the 
threshold value does not affect the final result. 

Based on the experiments, it can be concluded that the 
threshold value has almost no impact on the classification of 
OA in the FGSCR- 10 dataset. In the CIFAR-10 dataset used in 
this experiment, setting a higher threshold value allows for the 
rational utilization of differences between different classifiers, 
obtaining complementary information and thereby improving 
the OA of the classification results. 


VI. CONCLUSIONS 


This study proposes a feature fusion and decision fusion 
method that combines FPN with evidence theory to improve 
classification accuracy. The effectiveness of this method is 
validated on both traditional natural image datasets and fine- 
grained remote sensing image classification datasets. For the 
fine-grained remote sensing image dataset, FPN is utilized 
for feature-level fusion to capture the lost detailed features 
in shallow networks. Simultaneously, evidence theory is ap- 
plied to modify the generated probability distributions. In the 
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Table XVII 
PARTIAL PROBABILITY DISTRIBUTION UNDER THE FGSCR- 10 
WITH E-FPN. 


Category my 

Arleigh_Burke-class_destroyer 0.98235 
Cargo_ship 0.00001 
Civil_yacht 0 
Container_ship 0 
Medical_ship 0 
Nimitz-class_aircraft_carrier 0 
San_Antonio-class_transport_dock 0 
17 

0 

0 


Ticonderoga-class_cruiser 0.0 
Towing_vessel 
Wasp-class_assault_ship 


cocococcoH|s 


coroscoccods 


experimental section, the network architecture and parameters 
of this method are discussed, and the impact of different fusion 
rules on the final classification accuracy is compared. The 
experimental results demonstrate that selecting appropriate 
sizes of input images and using both feature-level fusion 
and decision-level fusion can effectively improve classification 
accuracy. Additionally, reducing conflicts between different 
classifier results through the addition of weights contributes 
to the enhancement of classification results in certain cases. 

The proposed E-FPN method still has some issues that need 
to be optimized. For instance, as demonstrated in Tables II, II, 
and IV in Section V-D of the paper, currently, E-FPN did not 
achieve significant improvement compared to the other three 
fine-grained image classification algorithms in the ship fine- 
grained classification task. Furthermore, when compared to the 
baseline models on the CIFAR-10 dataset, the improvement 
of our proposed method was not significant. We believe this 
is due to the small image resolution in this dataset, where 
the utilization of multi-scale features might not effectively 
extract and fuse large-scale and small-scale features, leading 
to the incomplete exploitation of the advantages of multi- 
scale features. Additionally, E-FPN has a higher number of 
parameters than other algorithms, which demands significant 
storage resources when deployed, and this limitation requires 
optimization in future work. 

Moreover, the current usage of E-FPN involves the classi- 
fication of single, complete images, which poses significant 
challenges when encountering scenarios with multiple objects 
or complex background environments in the image. 

Future work should focus on applying this method to 
different feature extraction networks and exploring its general- 
izability. Additionally, further research should explore detail- 
oriented feature extraction and fusion methods to replace the 
fusion of entire feature maps, aiming to reduce the complexity 
and number of parameters of the method. Simultaneously, it 
is important to explore methods that prioritize the object’s 
location in the image to mitigate interference caused by 
background objects in the classification process. 
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Abstract—In this paper, we propose a real experiment for 
building and realizing the physical combination of basic belief 
assignments associated with two independent, informative, and 
equi-reliable sources of information, according to the famous 
Zadeh’s example. This experiment is based on a particular 
electronic circuit box, called Z-box, enabling to observe and to 
check the fusion result experimentally. Our experimental results 
clearly invalidate the fusion result obtained by Dempster-Shafer’s 
rule of combination and show that it is physically possible to 
consider in a natural fusion process two independent and equi- 
reliable sources of evidences at same time, even if they appear 
as highly conflicting in Shafer’s sense. 


Keywords: belief function, Zadeh’s example, Z-box experi- 
ment, information fusion, Dempster’s rule. 


I. INTRODUCTION 


Dempster-Shafer Theory (DST), introduced by Shafer in 
1976 [1] offers an elegant theoretical framework for modeling 
epistemic uncertainty and for combining distinct bodies of 
evidence collected from different sources. In DST, the com- 
bination (fusion) of several distinct sources of evidences is 
done with Dempster-Shafer (DS)! rule of combination, which 
corresponds to the normalized conjunctive consensus operator 
[1], assuming that the sources are not in total conflict”. Since 
1976, DST has been used in many fields of applications, 
including information fusion, pattern recognition, decision 
making, etc, but it also has been seriously criticized by some 
authors [2]— [12]. 

In spite of it, starting from Zadeh’s criticism [2]—[4], many 
questions have arisen about the validity and the consistency 
of this theory when combining uncertain and conflicting evi- 
dences expressed as basic belief assignments (BBAs). Zadeh’s 
“paradox” [2] is the first example where DS rule gives 
an apparently counter-intuitive result in highly conflicting 
case. Another very interesting example showing the counter- 
intuitive behavior of DS rule in some very low conflicting 
cases has been discovered recently and discussed by the 
authors in [11]. 


‘Although the rule has been proposed originally by Dempster, we call it 
Dempster-Shafer rule because it has been widely promoted by Shafer in DST. 

otherwise DS rule is mathematically not defined because of 0/0 indeter- 
minacy. 


Since the publication of Zadeh’s example, many researchers 
and engineers [5]—[9], [14] working in applications with belief 
functions have observed and admitted that DS rule is problem- 
atic for evidence combination, especially when the sources 
of evidence are highly conflicting. A most recent detailed 
discussion on the validity of DS rule can be found in [10]-[12]. 
It is worth noting that the discussion of the choice of semantics 
for the justification of a rule of combination is not the purpose 
of this paper. We just want to revisit and discuss here the 
most well-known Zadeh’s emblematic example only from a 
physical-based standpoint because we are very concerned with 
fusion in real applications, especially for defense and security. 

This paper was inspired by our curiosity to revisit Zadeh’s 
example on the base of a real experiment, in order to become 
aware of the authentic physical fusion process (validated 
by the Nature’s physical laws) and to understand the way 
how this emblematic example is “resolved” in actual fact by 
the Nature. In this paper, we propose a real experiment for 
generating BBAs from physical quantities that are consistent 
with BBAs inputs given in Zadeh’s example, and that can 
be fused automatically by a pure natural phenomenon. Our 
paper shows that in this Z-box experiment, Dempster’s rule 
of combination is inconsistent with physical (fusion) law of 
Nature and thus it cannot be used to predict the experimental 
results. Our experiment can be reproduced and verified by any 
reader who wants to check by him/herself the validity of our 
results. In this experiment, we have considered and generated 
two independent Bayesian BBAs that are equi-reliable and fit 
with Zadeh’s BBAs inputs and let the Nature combine them 
physically, and we just observe what happens. Even if the 
two Zadeh’s Bayesian BBAs appear as highly conflicting (in 
Shafer’s sense), we have shown that it is however possible to 
make a physical experiment in which each source provides a 
BBA as chosen by Zadeh. This is possible because each source 
has only a partial knowledge of the state of the world. 

In this work, we have just designed a simple physical 
experiment in which the fusion procedure is just governed 
by the physical law of Nature. All the fusion rules aim to 
obtain good and reasonable fusion results. We do think that 
to use such a physical experiment for testing DS rule (a type 
of fusion rule) makes sense and is rational, and our results 
indicate that DS rule does not agree with the physical (natural) 
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fusion process. To certify that DS fusion rule is undoubtedly 
valid and really useful in practical applications, it should be 
proved valid through an undisputed experimental protocol and 
tested on real experiments, and not claimed valid from specific 
justifications conditioned by particular choices of semantics 
that have been disputed since more than four decades in the 
scientific community. The choice of a semantic interpretation 
of fusion, although interesting, is not our major concern here. 
So far, and to authors knowledge, there is no undisputed 
experimental protocol proving that Dempster’s rule is valid, 
even if Shafer proposed an interpretation based on a random- 
code interpretation of belief functions (BF) in [13]. It is also 
worth recalling that DS rule is not a generalization of Bayesian 
inference because even when BBAs are Bayesian, DS and 
Bayes rules become incompatible as soon as the a priori is 
truly informative (i.e. it is not vacuous, nor uniform) — as it 
is in the vast majority of practical cases in fact, see [12] and 
references inside for justifications with examples. That is why, 
it is vain (in our opinion) to search for a real valid and general 
physical experiment validating DS rule in the general context 
of belief functions. 

After a brief recall of the basics of DST and Zadeh’s 
example, we will present in details our Z-box experiment and 
discuss its results in the next sections. 


II. BASICS OF DST 


Let O = {61,62,...,4,} be a frame of discernment of a 
problem under consideration containing n distinct exclusive 
and exhaustive elements 6;, 7 = 1,...,n. A basic belief 
assignment? (BBA), m/(.) : 2° — [0,1] is a mapping from 
the power set of © (i.e. the set of subsets of ©), denoted 
2°, to [0,1], that must satisfy the following conditions: 1) 
m(@) = 0, ie. the mass of empty set (impossible event) 
is zero; 2) }iye,o m(X) = 1, ie. the mass of belief is 
normalized to one. The quantity m(X) represents the mass 
of belief exactly committed to X. An element X € 2° is 
called a focal element if and only if m(X) > 0. The set 
F(m) = {X € 2°|m(X) > 0} of all focal elements of a BBA 
m/(.) is called the core of the BBA. The vacuous BBA char- 
acterizing the full ignorance is defined by m,(.) : 2° — [0; 1] 
such that m,(X) = 0 if X 4 O, and m,(O) = 1. 

From any BBA m(.), the belief function Bel(.) and the 
plausibility function Pl(.) are defined for VX € 2° as: 
Bel(X) = dLey|ycx m(Y) and PI(X) = dy xny40 mY). 
Bel(X) and PI(X) are classically interpreted as lower and 
upper bounds of an unknown subjective probability P(.) and 
one has the following inequality satisfied Bel(X) < P(X) < 
PIX), VX € 2°. In DST, the combination (fusion) of 
several distinct sources of evidences is done with DS rule of 
combination, which corresponds to the normalized conjunctive 
consensus operator [1], assuming that the sources are not in 
total conflict*. DS combination of two independent BBAs 


3also called a belief mass function (BMF) by some authors, or a basic 
probability assignment (BPA) by Shafer. 

‘otherwise DS rule is mathematically not defined because of 0/0 indeter- 
minacy. 


mo(.) and ma&(.) is defined by moe(@) = 0, and for all 
X € 2° \ {} by 


1 
me(X) =——— x So ms(Xi)mo(X2), 
1— Ky ° 
X1,X2€2 
X{NXo=X 
where 
Ki22 SY) me(Xi)me(X2), (2) 
X1,X2€2° 
X1{NXe=0 


defines the so-called conflict between the two sources of 
evidence characterized by the BBAs ma(.) and m(.). 


Il]. ZADEH’S EXAMPLE 


The famous Zadeh’s example considers two doctors ex- 
amining a patient who suffers from either meningitis (A), 
concussion (B) or brain tumor (C). The frame of discernment 
is chosen as 0 = {A, B,C} and it is assumed as exhaustive 
and exclusive. Both doctors agree in their low expectation of a 
tumor, but disagree in likely cause and provide the following 
diagnosis, described by the following BBAs my(.) and ma(.) 
satisfying 


m,(A) = 0.90, mi(B) = 0.00, mi(C) = 0.10, (3) 
m2(A) = 0.00, mo(B) = 0.90, m2(C) = 0.10. (4) 


If one combines the two BBAs using DS rule of combination, 
the following counter-intuitive final conclusion is obtained 


mps(A) = 0.0, mps(B) =0.0, mps(C) =1.0. (5) 


The conclusion made on the base of DS rule is that the patient 
has for sure a brain tumor because it is the only diagnose that 
both doctors agree on even if the two experts (doctors) agree 
that tumor is unlikely but are in almost full contradiction for 
the other causes of the disease. What is even more questionable 
is that the same conclusion (the brain tumor is unlikely) would 
be obtained regardless of the probabilities associated with 
the other possible diagnoses. This very simple but interesting 
example shows the limitations of practical use of the DST 
for automated reasoning and has widely been discussed in the 
literature [2]—[12]. 

A more emblematic and interesting example, involving 
possibly low conflicting sources, has been discovered recently 
and discussed in [10]-[12]. It corresponds to the case where 
the two equi-reliable doctors’ reports concern the following 
BBAs satisfying m (A) = a, m (AUB) = 1-—a and 
m2(AUB) = by, m(C) = 1—b; — bo, m2(AU BUC) = bo, 
with parameters 0 < a,b,,b2,< 1. It is easy to verify that 
the conflict given by (2) is equal to Ky2 = m1(A)m2(C) + 
m(A U B)m2(C) = 1 — bi — be. Surprisingly, this conflict 
does not impact (it can be very high, or very low) the DS 
fusion result because one always has in this new example 
mps(.) = mj,(.). This result is also abnormal and counter- 
intuitive because the second source m2(.) (the 2nd doctor 
diagnosis) does not count at all in DS fusion process, even if 
mMg(.) is not vacuous (it is informative) and truly conflicting 
with the first doctor’s diagnosis ™m4(.). 
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IV. A REAL Z-BOX EXPERIMENT 


In this section, we propose an electronic circuit (called Z- 
box scheme) as shown in Fig. | to generate BBAs according 
to Zadeh’s example and to test experimentally the physical 
fusion of these BBAs. 


os — i 14(B) MOQ 
{ { 
P3 
6V LEDC 
Ih 
P6 
SW5 Sws 
l2total —> 


Figure 1. Z-box Scheme. 


It is clear that this scheme can be easily extended to build 
and combine more than three Bayesian sources of evidence 
as well, which is out of the scope of this paper. This scheme 
utilizes a simple battery of 6 Volts as an only circuit’s power 
supply. The switches SW1 and SW5 are used to obtain two 
independent sub-circuits, in order to realize two independent 
sources of information for the purpose of our task. Three 
simple linear potentiometers (P;, P2, P3) and three switches 
(SW2, SW3, and SW4) are used to establish the first source 
(sub-circuit 1), respectively three potentiometers (P4, Ps, 
Ps) and three switches (SW6, SW7, and SW8) for the 
second source (sub-circuit 2). Each of these two sources 
of information provides its relative truth, established on 
its own knowledge only, by setting the special tuning of 
corresponding sets of potentiometers. Three white Light 
Emitting Diodes (LED’s - LED,, LEDp, and LEDc) 
are put to be utilized as light indicators. The light intensity 
is proportional to the current values through the LED’s. 
We are concerned with the answer of the question: which 
LED emits the light with strongest intensity? Our frame is 
@ = {A 4 LED,,B = LEDg,C & LEDc}. The Z-box 
experiment consists in three main steps: 1) tuning the source 
no. | (Sub-circuit 1) to generate BBA mj(.); 2) tuning the 
source no. 2 (Sub-circuit 2) to generate BBA ma(.); and 3) 
the physical fusion of the two BBAs. The descriptions of 
these steps are given in the sequel and are illustrated in the 
figures 2-4. 


Step 1: Tuning the first source (Sub-circuit 1) according to 
Fig.2. Only the upper branch of the circuit is active with the 
following settings: 


e Switch SW1 is closed and switch SW5 is open. 


e Switches SW2 and SW4 are closed. Switch SW3 is 


left open, providing a zero-current through LEDs: 
I,(LEDpg) = 0.0 mA. 

e The potentiometers (P;, P3) are tuned to provide the fol- 
lowing current values through the LED’s: 1)(LED,) ~ 
32.5 mA, (LEDg) = 0.0 mA and (LEDc) & 3.6 
mA, where the index {1} is used to denote the Ist source 
of information. 


| 
1,total —> (A) 
sw 


SW5 t 
lo, total —> 


Figure 2. Step 1 of the experiment : setting the BBA ™1(.). 


Step 2: Tuning the second source (Sub-circuit 2) according to 
Fig. 3. Only the lower branch of the circuit is active with the 
following settings: 


e Switch SW1 is open and switch SW5 is closed. 

e Switches SW7 and SW8 are closed. Switch SW6 is 
left open, providing a zero-current through LED, as 
In(LED,) = 0.0 mA. 

e The potentiometers (P;, Ps) are tuned to provide the 
following current values through the LED’s: Jp(.) = 
{Ip(LEDa) = 0.0 mA, In(LEDg) © 32.5 mA, and 
In(LEDc) © 3.6 mA, where the index {2} is used to 
denote the 2nd source of information. 


It ttl —> 
ie 1,(A) 1,(B) 1,(C) 
swi1 Ssw2 SW3 Sw4 


SW5 
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Figure 3. Step 2 of the experiment : setting the BBA ™ma(.). 
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Step 3: Both branches of the circuit are active at the same 
time for making the physical fusion. More precisely, we set 
the switches SW2, SW3 and SW4 and tune the potentiometers 
P,, P2 and P3 according to Step 1, and we set the switches 
SW6, SW7 and SW8 and tune the potentiometers P,, P; and 
Pe according to Step 2. The switches SW1 and SWS are closed 
to implement the fusion of the sources as shown in Fig. 4. 


11 otal —_> 


14(A) 


Figure 4. Step 3 of the experiment : the (physical) fusion of BBAs. 
At this step, one gets: 


I,(LEDa) © 32.5 mA, 
I,(LEDg) = 0.00 mA, (6) 
I,(LEDc) ® 3.6 mA. 


and 
In(LED,) = 0.00 mA, 
In(LEDgp) © 32.5 mA, (7) 
In(LEDc) © 3.6 mA. 


The total current intensities are respectively equal to 


Ti, total = Diet a,p,o} Li(LEDi) © 36.1 mA, 
I2,total = ie{A,B,C} In(LED;) © 36.1 mA. 


Fig. 5 shows the different LED’s current values obtained 
in each step during the experiment’s time duration of 5 sec. 
In the left subplots of Fig. 5 (result of step 1), one sees that 
the current through LED, is 9 times higher than the current 
through LE Dc, while the current through LE Dz is almost 
zero, whereas in the middle subplots of Fig. 5 (result of step 
2), one sees that the current through LE'Dz is 9 times higher 
than the current through LE Dc, while the current through 
LED, is almost zero. The observed results make perfect 
sense. Because the light intensity is proportional to current 
values through the LEDs, the same proportions are valid for 
the intensity of the light emitted from the LEDs. One sees 
that these settings fit with the input BBAs of Zadeh’s example 
because after the normalization of current values one has the 


following masses of belief in the origin of the strongest light 
emission: 


m,(A) & BUEPA) ~ 0.9 
1,total 3 
mi(B) & Tegal = 0.0, (8) 
A 

My C) = a arr ye 0.1, 

and 
ma(A) © BIZEDal — 0, 
m2(B) & BiLeP | ~ 0.9, (9) 
m2(C) & BERS) 0.1, 


The results of steps 1 and 2 show that both of the 
sources (corresponding to Ist and 2nd sub-circuits), taken 
independently, are able to make a correct physical assessment 
of the real physical situation. The right subplots of Fig. 
5 (result of step 3) show the real physical fusion results 
simulated from MicroSim DesignLab 8 [18], as shown through 
the screen copy given in Fig. 6. Here we use the index 
{12} to denote that both sources (sub-circuits) are active. 
The observed current intensities are Iyg(LED,) ~ 32.5 
mA, [io(LEDg) & 32.5 mA, and Tya(LEDc) & 6.9 
mA. After the normalization of Iy2(.), we get finally the 
combined BBA my9(.) over the frame of discernment O + 
{A, B,C} that is given by m12(A) = Iio(LEDa)/Iha.total © 
0.45, my2(B) & T2(LEDg)/Th2,t0tal y 0.45, and 
mi2(C) = Tye(LEDc)/Th2,totat © 0.10, where Ih2 total = 
Ty2(LEDa) + Iio(LEDp) + he(LEDc) © 71.9 mA. 

Clearly, the observed fact is that after the real physical 
fusion, the current through LED 4 is just equal to the current 
through LE Dz, and both are approximately 5 times higher 
than the current through LE Dc. The experimental fusion 
result does not fit with the predicted result based on DS rule 
(5), nevertheless in this experiment both BBA inputs match the 
medical experts’ opinions as in Zadeh’s example, and they are 
considered to be in high “conflict” according to the classical 
interpretation in DST. This result brings to light the fact that 
DS rule result (5) is not consistent in this experiment with 
what the physical fusion system provides. This real Z-box 
experiment supports Zadeh’s intuition about the non-adequate 
behavior of DS rule, and the counter-intuitive decisions that 
can be drawn from it. Stated otherwise, the natural physical 
fusion does not follow DS rule of combination. In fact, the 
notion of “conflict”, which plays an important role when 
manipulating belief functions, is questionable, since it appears 
quite artificial in physics (in natural phenomenon). The conflict 
plays however a main role in decision-making in human 
reasoning. The way in which the total or partial conflicts are 
managed by Shafer’s evidential reasoning is incompatible with 
this simple physical experiment. 

It is worth noting that the physical fusion of sources of 
Zadeh’s example is consistent with the simple averaging rule, 
and (relatively) consistent with PCR6 fusion rule [17] (Vol. 
2) which will provide in this example mpcre6(A) = 0.486, 
mpcre(B) = 0.486, and mpcre(C) = 0.028. Contrarily 
to DS rule, PCR6 is fully consistent with the averaging 
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Figure 5. LEDs current values for source 1, source 2, both sources (by physical fusion). 
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Figure 6. Screen copies of MicroSim schematics and its physical fusion result. 
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rule for estimating frequentist probabilities in binary random 
experiments, see [19] for details with examples. 


V. CONCLUSIONS 


In this paper a real experimental method for building basic 
belief assignments associated with two independent, informa- 
tive, and equireliable sources of information, following the 
emblematic Zadeh’s example has been presented. It is based 
on a particular electronic circuit box (called Z-box), enabling 
to observe and to check the fusion result experimentally. 
Zadeh’s intuition about the non-adequate behavior of DS 
rule and the counter-intuitive decisions obtained on its base 
is perfectly defended by Nature through this experiment. A 
similar experiment, called Z-aquarium experiment can also be 
done with fluids (with a container filled of water) instead of 
an electronic circuit, but it is more complex to set up and it 
has not been reported in this paper. Our conclusion is that 
Dempster-Shafer Theory does not agree with the physical 
fusion process at least for a situation that fits with Zadeh’s 
example. The more general question on the validity of DST 
(especially, when subjective beliefs are considered) was not 
the purpose of this paper because this question has already 
been addressed in details in our previous research works put 
in references. 
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Abstract—The theory of belief functions is a very appealing 
theory for uncertainty modeling and reasoning which has been 
widely used in information fusion. However, when the cardinality 
of the frame of discernment and the number of the focal elements 
are large the fusion of belief functions requires in general a 
high computational complexity. To circumvent this difficulty, 
many methods were proposed to implement more efficiently the 
combination rules and to approximate basic belief assignments 
(BBA’s) into simplest ones to reduce the number of focal elements 
involved in the fusion process. In this paper, we present a 
novel principle for approximating a BBA by withdrawing more 
redundant focal elements of the original BBA. Two methods 
based on this principle are presented (using batch and recursive 
implementations). Numerical examples, simulations and related 
analyses are provided to illustrate and evaluate the performances 
of this new BBA approximation method. 

Index Terms—Evidence theory; belief functions; basic belief 
assignment; approximation. 


I. INTRODUCTION 


The original theory of belief functions, also known as 
Dempster-Shafer Theory (DST) [1] has been widely used in 
information fusion, pattern recognition and decision making 
due to its advantages in representing uncertain information and 
partial knowledge. However, the computational complexity is 
one of its drawbacks [2], specially for combining sources of 
evidences expressing their BBA’s with respect to large frames 
of discernment (FoD). The computational complexity of the 
evidence combination is strongly affected by the cardinality 
of the FoD and the number of focal elements of the BBA of 
the sources to combine. 

To reduce the computational complexity of evidence com- 
bination, various approaches have been proposed, which gen- 
erally fit within the following two categories: 


a) Efficient implementation for performing exact computa- 
tions of the chosen rule of combination. For example, 
an optimal algorithm for Dempster’s rule of combination 
was proposed by Kennes [3]. Barnett [4], Shafer and 
Logan’s [5] works are also representatives of this aspect. 

b) Approximation of simplification of BBA’s. For example, 
k; —l— a approach [6], summarization approach [7], the 
D1 approximation [8], inner and outer approximations 
[9], Monte-Carlo based approximation [10], etc., remove 
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focal elements and redistribute the corresponding mass 
assignments. In our previous works, we also had proposed 
hierarchical proportional redistribution approach [11], 
and the optimization-based BBA approximations [12]. 
The work presented in this paper focuses on the reduction 
of evidence combination’s computational cost thanks to BBA 
approximations. In the aforementioned works of category b), 
the different methods propose to remove some focal elements 
according to some criteria, typically based either on their mass 
values or on their cardinalities. We think that only mass values 
or focal element cardinality are not enough for selecting the 
focal elements to remove for making good BBA approxima- 
tion. We propose a novel approach using the notion of focal 
element redundancy. Those relatively redundant focal elements 
should be removed and those relatively non-redundant ones 
should be remained. To quantify this notion of redundancy, we 
use the average distance between a given focal element and 
all the other focal elements. Smaller average distance means 
that the given focal element carries similar information when 
compared with others, i.e., it is more redundant and should 
be removed at first. User can preset the desired number of 
remaining focal elements (also the number of removed focal 
elements). Two removing procedures (including a batch mode 
and a iterative mode) are proposed in the sequel, followed 
by the re-normalization or redistribution. Numerical examples, 
simulations and related analyses are provided to show the 
rationality and interest of these novel BBA approximation 
approaches. 


II. BASICS OF BELIEF FUNCTIONS 
The theory of belief functions has been developed by 
Shafer [1] in 1976 from early works of Dempster. In DST, 
the elements in frame of discernment (FoD) © are mutually 
exclusive and exhaustive. A basic belief assignment (BBA), 
also called a mass function, is a mapping m(-) : 2° > [0,1] 
satisfying m(Q) = 0 and 


S> m(A) =1 (1) 
A€E22 


If m(A) > 0, A is called a focal element of the BBA m(-). 
In DST, the combination of two distinct bodies of evidence 
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(BOEs) m(-) and ma(-) is done using Dempster’s rule as 
follows. VA € 2° : 


ay Lo if A=0 
add TOK Dang, =a (Ai)ma(By), if AAO 
(2) 


where K = 97 4,4,8,-9 ™1(Ai)m2(B;) is the total conflicting 
mass assignments, which is discarded by normalization in 
Dempster’s rule. It can be found from Eq. (2) that Dempster’s 
rule is both commutative and associative. Dempster’s rule 
has been seriously criticized for its counter-intuitive behaviors 
both in high conflicting and low conflicting situations [13], 
and other rules of combination have been developed in the 
literature — see [14] for details. These modified or refined 
combination rules focus on suppressing the counter-intuitive 
behaviors of Dempster’s rule. However, like Dempster’s rule, 
they all have to face the problem of high computational 
complexity with the increase of the FoD’s cardinality and the 
quantity of the focal elements. 

To reduce the computational cost of combination of BBA’s 
and make the fusion process tractable, we can as a first strategy 
switch to more simple rules of combination or try to develop 
efficient implementations of sophisticate rules, or as a second 
strategy simplify (approximate) original BBA to combine by 
simplest BBA with less focal elements of smaller cardinalities, 
or we can mix both strategies as well. In this paper, we focus 
on the second strategy devoted to BBA approximation, which 
is more intuitive for human to catch the meaning [15]. 


III. EXISTING BBA APPROXIMATION APPROACHES 


Some existing BBA approximation approaches are briefly 
recalled in this section for the purpose of comparisons with 
the novel methods proposed in this paper. 

1) k —l—« method [6]: This approach has been proposed 
by Tessem in 1993. The simplified BBA is obtained by 

e keeping no less than /; focal elements; 

e keeping no more than / focal elements; 

e by deleting the masses which are no greater than 2. 

In k —!—~a, all original focal elements are sorted according 
to their mass values in a decreasing order. Then, the first p 
focal elements are chosen such that k < p < I and such 
that the sum of the mass assignments of these first p focal 
elements is no less than 1 — x. The removed mass values 
are redistributed to remaining focal elements by a classical 
normalization procedure. 

2) Summarization method [7]: This method is similar to 
the & —1— «x and it also keeps focal elements having highest 
mass values. The mass values of focal elements to remove are 
accumulated and assigned to the their union set. Suppose k 
is the desired number of focal elements in the approximated 
BBA ms(-) of a given BBA m(-). Let M be the set of k — 1 
focal elements with the highest mass values in m/(-). Then 
mg(-) is obtained from m/(-) by 


m(A), if Ae M 
mg(A) = DIAICA,A'EM m(A’), if A => Ao (3) 
0, otherwise 


where Ag is determined by 


As YU A (4) 
A’¢M,m(A’')>0 


3) D1 method [8]: Let m/(-) be the original BBA to 
approximate. mg(-) denotes the approximated BBA and the 
desired number of focal elements is k. Let M be the set of 
k;— 1 focal elements with the highest mass values in m(-) and 
M~ be the set including all the other focal elements of m/(-). 
The basic idea of the D1 method is to keep all the members 
of M as the focal elements of ms(-) and to assign the mass 
values of the focal elements in M~ among the focal elements 
in M according to the following procedure. 

Given a focal element A € M~, in M, find all the supersets 
of A to form the collection M4. If M4 is not empty, the mass 
value of A is uniformly assigned among the focal elements 
with smallest cardinality in M/4. When M4, is empty, then 
construct M/%, as 


Mi, ={BeM||B| > |A|,BN AFD} (5) 


Then, if 14’, is not empty, m(A) is assigned among the focal 
elements with smallest cardinality in M!’,. The value assigned 
to a focal element B depends on the value of |BM A|. Such 
a procedure is executed iteratively until all m(A) have been 
assigned to the focal elements in VM. 

If 14/', is empty, there are two possible cases: 


1) If the total set 0 € M, the sum of mass values of the 
focal elements in M/~ will be added to 0; 

2) If O ¢ M, then set O as a focal element of mg(-) and 
assign the sum of mass values of the focal elements in 
M~ to mg(O). 


More details on D1 method with examples can be found in [8]. 


The basic principle of these three previous approaches of 
BBA approximation is to remove the focal elements having 
smaller mass values because they are deemed as unimportant. 
Besides theses methods, there exist other works on BBA 
approximations. For example, Denceux inner and outer ap- 
proximations [9], Grabisch’s k-additive BBA approximation 
[16], and our previous works based on hierarchical propor- 
tional distribution (HPR) [11] and optimization-based BBA 
approximations [12]. In these methods, the aim is to remove 
the focal elements with larger cardinalities because they bring 
more computational cost in the fusion process in general (see 
related references for details). 


IV. NEW BBA APPROXIMATIONS USING THE PRINCIPLE 
OF FOCAL ELEMENT REDUNDANCY 


As briefly shown in the previous section, the existing 
BBA approximation approaches propose to remove some focal 
elements by eliminating those with smaller mass values, or 
with larger cardinalities. Although these methods have some 
rational justification, only mass values or cardinalities are not 
enough in our opinion for judging which focal elements should 
be removed for making BBA approximation. We consider that 
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it is quite hazardous (risky) to deem focal elements having 
small mass values as unimportant. It may also be dangerous 
to remove the focal elements with large cardinality justified 
only by the possible high computational cost they may cause in 
the fusion process. So, we should be cautious when adopting a 
BBA approximation technique. We agree with the fact that fo- 
cal elements that are considered unimportant must be removed 
at first in an approximation method. However, focal elements’ 
mass values are not enough for judging their importance. 
A more solid index (criterion) should be found to estimate 
the importance of a focal element to keep. Because the very 
redundant focal elements can reasonably be considered as 
unimportant and the relatively non-redundant focal elements 
can reasonably be considered as important, we define the 
degree of non-redundancy for a focal element at first. From 
this degree of non-redundancy, we can then develop new BBA 
approximation methods as it will be shown. 


A. Degree of non-redundancy of focal elements 


Suppose a BBA m(-) has 1 focal elements. A distance 
between focal elements A; and A; proposed by Denceux [9] 
is defined as 


bn (Ai, Ay) = m (Ai) - [Ail +m (Aj) - [AG 


— [m (Aj) + m(A;)] -|AgN Aj| (6) 


If a focal element A; has the smallest average distance with 
other focal elements A; C 0,7 ¥ i, then A; shares most 
common information with other focal elements, i.e., A; is the 
most redundant. Therefore, we can define the degree of non- 
redundancy based on the average distance between a focal 
elements and others. First, we calculate the distance matrix 
for all the focal elements of m/(-) as 


bn (A1,A1) 6p (At, Az) bn (Ar, Ar) 
a | On (Az, 41) 6n (Az, A2) bn (Az, Ar) 

Matrr = ; . ; 
bn (Aj, A1) 6m (Al, Aa) dn (Al, Ai) 


It should be noted that 6, (Ai, Ai) = 0 and 6, (Ai, Aj) = 
dn (A;, Ai) where i = 1,...,1. Hence, it is not necessary to 
calculate all the elements in Matprr because the matrix is 
symmetric. 

We define the degree of non-redundancy of the focal ele- 
ment A; by 


il-1 


1 
dn (Aj, A;) (7) 
= 


I-1 


J 


nRd (A;) 4 


The larger nRd(A;) value, the larger non-redundancy (less 
redundancy) for A;. The less nRd(A;) value, the less non- 
redundancy (larger redundancy) for Aj. 

Based on the focal element redundancy, i.e., to use the 
degree of non-redundancy in (7), we propose two new BBA 
approximation methods described in the next subsections, 
where the more non-redundant focal elements will be remained 
and the more redundant ones will be removed. 


B. Batch approximation method 


Let m/(-) denote the original BBA to approximate with / 
focal elements. In the approximation, we want to keep k < | 
focal elements. First, we propose a BBA approximation with 
a batch processing, which means that the number of focal 
elements is reduced from / to & in one processing cycle as 
follows. 

e Step 1: Calculate Matp, at first, and for each A;, i = 

1,...,1 compute its non-redundancy value nRd(A;); 

e Step 2: Sort all the elements in descending order accord- 

ing to the values of nRd(A;); 

e Step 3: Remove the / — k bottom focal elements; 

e Setp 4: Normalize the mass values of the remaining 

k; focal elements and output the approximated BBA 
mB FA), 


C. Iterative approximation method 


In this method, we remove iteratively one most redundant 
focal element (with the least nRd value) in each cycle until 
k; focal elements are remained. This method consists of the 
following steps: 

e Step 1: Calculate Matpg and nRd for each A;, 7 = 

1 eerste 

e Step 2: Sort all the elements in descending order accord- 
ing to their values of nRd(A,); 

e Step 3: Remove the bottom focal element A,; 

Setp 4: If the number of remaining focal element is larger 
than k, recalculate nRd(A;) for 7 = 1,...,1,i Ar and go 
to Step 3. Otherwise, go to Step 5 ; 

e Setp 5: Normalize the mass values of the remaining 
k; focal elements and output the approximated BBA 
mia), 

For this iterative method, the degrees of non-redundancy are 
recalculated in each cycle after removing a focal element in 
the previous cycle. That is to say, in each cycle, only the 
non-redundancy of the current remaining focal elements are 
concerned. 


D. Illustrative examples 

Here we provide a simple numerical example to illustrate 
the implementation procedures of some available BBA 
approximation approaches with respect to our two new 
methods. 


Example 1: Let consider the BBA m/(-) defined over the FoD 
O = {61, 02, 03, 04, 05} listed in Table I. 


TABLE I 
FOCAL ELEMENTS AND MASS VALUES OF m/(-) 


Mass values 


Focal Elements 
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1) Using k —l1— «x method [6]: Here k and / are set to 
3. x is set to 0.1. The focal elements Ay = {03,64} and 
As = {04,05} are removed without violating the constraints 
in kk —1 — a. The remaining total mass value is 1 — 0.05 — 
0.05 = 0.9. Then, all the remaining focal elements’ mass 
values are divided by 0.9 to accomplish the normalization. The 
approximated BBA m# (-) obtained by k — 1 — a method is 
listed in Table II, where A‘, i = 1,2,3 are the focal elements 
of m#*(-). 


TABLE II 
mk!® (.) OBTAINED USING k —1 — x 


Focal Elements Mass values 


0.3333 


2) Using summarization method [7]: Here k is set to 3. 
According to the summarization method, the focal elements 
As = {03}, Ag = {03,04} and As = {04,05} are removed, 
and their union {63, 64, 95} is generated as a new focal element 
with mass value m({03}) +m({@3, 04}) + m({64, 05}) = 0.2. 
The approximated BBA me is listed in Table II below. 


TABLE III 
mg"™ (-) OBTAINED USING SUMMARIZATION 


Focal Elements Mass values 


= {63, 04, 45} 


3) Using D1 method [8]: Here k is still 3. It can be 
obtained that A,, Ap belong to M, and A3, Ag, As belong to 
M—. The focal element A; = {61,62} has empty intersection 
with the focal elements in (/~, therefore its value will be 
unchanged. In M, Ag is the unique superset of A3 and Ag, 
therefore, m(A3) + m(A4) = 0.10 + 0.05 is added to its 
original mass value. Az also covers half of As, therefore, 
m(As5)/2 = 0.025 is further added to the mass of Ag. 
Finally, the rest mass value is assigned to the total set O. 
The approximated BBA me ! is listed in Table IV. 


TABLE IV 
m8 1(-) OBTAINED USING SUMMARIZATION 


Mass values 


4) Using Deneux inner approximation [9]: Because this 
method uses the focal element distance in Eq. (6), we also 
apply it in this exampe for comparison. With the inner 
approximation method, the focal elements pair with smallest 
distance are removed, and then their intersection is set as 


the supplemented focal element whose mass value is the 
sum of the removed two focal elements’ mass values. Such 
a procedure is repeated until the desired number of focal 
elements is reached. The results at each step are listed in Table 
V. 


TABLE V 
BBA’S OBTAINED USING INNER APPROXIMATION 


Mass values 


A= {01,05,01 


As we can see in Table V, it generates the empty set as a 
focal element, which is not allowed in the classical Dempster- 
Shafer evidence theory under close-world assumption. 

5) Using the redundancy-based batch approximation 
method: The desired remaining focal element is set to k = 3. 
We first calculate the distance matrix Matrp and we get 


Ay | 0 1.10 1.10 1.10 yy 
A, | 1.10 0 0.60 0.30 0.65 
ie Az; | 1.10 0.60 0 0.05 0.20 
A, | 1.10 0.30 0.05 0 0.10 
As | 1.10 0.65 0.20 0.10 0 


Aj Ag A3 Ay As 


Based on this matrix, the degree of non-redundancy for each 
focal elements of m(-) can be obtained. It is listed in Table 
VI. 


TABLE VI 
NON-REDUNDANCY FOR DIFFERENT FOCAL ELEMENTS 


Focal Elements 


aR; 


Since Ag and A, at the bottom have the two least nRd 
values, they correspond the two focal elements with the lowest 
non-redundancy, i.e., the highest redundancy. Therefore, they 
are removed and their mass values are redistributed thanks 
to the classical normalization step. The approximated BBA 
ite is listed in Table VIL. 


TABLE VII 
md.) OBTAINED USING THE BATCH APPROXIMATION BASED ON 
REDUNDANCY 


Mass values 


Focal Elements 


0.0588 
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6) Using the redundancy-based iterative approximation 
method: The number of remaining focal elements is still set 
to k = 3, so that two focal elements have to be removed. In 
the iterative mode, only one focal element is removed in each 
cycle, thus two cycles are needed. 

In cycle I, the degree of non-redundancy is the same as 
listed in Table V. Then, the focal element A, is removed in 
first cycle. 

In cycle II, recalculate nRd for A;, i = 1,...,5,i 4 4 
according to nRd(A;) = are 6(A;, A;). The results 
are 

nRd(A1) = 1.1000, nRd(Az) = 0.7833, 
nRd(A3) = 0.6333, nRd(As;) = 0.6500 


Then, A3 is removed in this cycle due to its the lowest nRd 
value (the highest redundancy among the remaining focal 
elements). The approximated BBA obtained using iterative 
way is the same as the one listed in Table VI. It should 
be noted that the batch approximation and the iterative 
approximation will not always output the same results as 
shown in the next example. 


Example 2: Suppose that FoD is 0 = {61,42,03}. The BBA 
to approximate is listed in Table VIII, and the desired number 
of remaining focal elements is k = 3. 


TABLE VIII 
FOCAL ELEMENTS AND MASS VALUES OF m(-) 


Mass values 


Focal Elements 


The distance matrix Matprr is 


Ay 0 0.4258 0.1780 0.5319 0.1662 
Ag 0.4258 0 0.2477 0.2477 0.1662 
A3 0.1780 0.2477 0 0.4080 0.3325 
Ag 0.5319 0.2477 0.4080 0 0.3325 
As 0.1662 0.1662 0.3325 0.3325 0 
Ay Ag A3 Ag As 


The degree of non-redundancy of focal elements are 


nRd(A1) = 0.3255, nRd(Ag) = 0.2719, 
nRd(A3) = 0.2916, nRd(A4) = 0.3800, nRd(A5) = 0.2494 


With the batch approximation, the focal elements Az and A; 
are removed. After normalization, we get the approximated 
BBA listed in Table IX. 

With the iterative approximation method, the degree of non- 
redundancy obtained at Cycle I are also 


nRd!(A;) = 0.3255, nRd!(A2) = 0.2719, 


nRd!(A3) = 0.2916, nRd!(A4) = 0.3800, 
nRd!(As) = 0.2494 


TABLE IX 
md.) OBTAINED USING THE BATCH APPROXIMATION BASED ON 
REDUNDANCY 


Mass values 


Focal Elements 


The iterative approximation first removes the focal element 
As because it has the least nRd value. Then we recalculate 
the nRd values for A;, Az, A3, and Ay which gives us 


nRd!(4;) = 0.3786, nRd!(Ay) = 0.3071, 
nRd!(A3) = 0.2779, nRd!(A4) = 0.3959 


At Cycle II, the focal element Az having the least nRd value is 
removed. After normalization, we get the approximated BBA 
med.) using iterative approximation as listed in Table X. 


TABLE X 
m4 (.) OBTAINED USING THE BATCH APPROXIMATION BASED ON 
REDUNDANCY 


Mass values 


Focal Elements 


which is different of the result of Table IX using the batch 
approximation. 


V. COMPARATIVE ANALYSIS 


In this section, we present simulation results to compare 
the different BBA approximation approaches in terms of the 
computational cost and the closeness to the original one in 
average meaning. A BBA transformation with less compu- 
tational cost and more closeness is preferred. To measure 
the closeness or the dissimilarity between different BBAs, a 
distance measure between BBA is used. In this work, we use 
Jousselme’s distance [17] because it remains one of the most 
widely used distance of evidence. This distance is defined as 
A l I 
= 5 ¢ (m4 = mg) Jac (m4 = mg) (8) 
where Jac is the so-called Jaccard’s weighting matrix whose 
elements J,; = Jac(A;, B;) are defined by 


_ [Aen Byl 


Jac(Aj, B;) = |A;U By] 
4 J 


(9) 
A BBA m(-) here can be considered as a column vector 
according to the geometric interpretation of the theory of 
belief functions [18]. There are also other types of distance 
of evidence [18]. We choose to use Jousselme’s distance of 
evidence in this paper, because it has been proved to be a 
strict distance metric [19]. 

Our comparative analysis is based on a Monte Carlo 
simulation using M = 200 random runs. In j-th simulation 
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run, the BBA to approximate m/(-) is randomly generated 
and the different approximation results {m‘% (-)} are obtained 
using the different approximation approaches, where 7 
denotes the i-th BBA approximation approach. We calculate 
the computational time of the original evidence combination 
of m/(-) @ m(-) with Dempster’s rule, and the computation 
time of Dempster’s combination of each approximated BBA 
m(-) @ mg (-). As stated before, there are many available 
BBA approximation approaches. Here we only compare our 
proposed approaches with k — 1 — x method, D1 method, 
Summarization method because with these methods the 
number of the remaining focal elements and the empty set 
is never considered as a valid focal element (contrarily to 
inner approximation method which will bring troubles for 
making the comparisons because Jousselme distance cannot 
be computed if one allows to put mass on empty set because 


|0| = 0). 


In our simulations, the cardinality of the FoDO is chosen to 
3. In each random generation, there are 7 focal elements in the 
original BBA to approximate. The remaining number of focal 
elements for all the approaches used here are set to 6, 5, 4, 3, 
and 2. Random generation of BBA is based on Algorithm | 
[18] below. 


TABLE XI 
ALGORITHM 1: RANDOM GENERATION OF BBA. 


Input: O: Frame of discernment; 

Nmaz: Maximum number of focal elements 
Output: Output: m: BBA 

Generate P(Q), which is the power set of O; 
Generate a random permutation of P(O) > R(O); 
Generate an integer between | and Nmaz — 1; 
FOReach First k elements of R(Q) do 

Generate a value within [0,1] > m;, i = 1,...,1; 
END 

Normalize the vector m = [m, ...,m ] > m’; 
m(A;) =m; 


The average distance values over 200 runs between the 
original BBA and the approximated BBA’s obtained using 
different approaches given different remaining focal elements’ 
numbers are shown in Fig. |. The average (over all runs and 
all numbers of remaining focal elements) computation time 
and distance are shown in Table XII. 


TABLE XII 
COMPARISONS BETWEEN DIFFERENT BBA APPROXIMATIONS IN TERMS 
OF TIME AND CLOSENESS 


Approaches Time (ms) 
Batch-redundancy 0.1026 
Iterative-redundancy 0.1059 
k—-l—a 0.1073 
DI 0.1039 
Summarization 0.1034 


0.35 
= mia Rd 
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| |= #*=—k-|- é 
Q 0.25 k-I-x Y A 
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Fig. 1. Comparisons between different approximations in terms of the distance 
of evidence. 


As we can see in Fig. | and in Table XII, all the method have 
the average computation time around 0.1 ms, which is reduced 
when compared with the original average computation time 
which is 0.2011 ms. It means that all the methods can well 
reduce the computational cost. Our new BBA approximation 
approaches based on focal element redundancy outputs BBA’s 
which are closer to the original one when compared with 
other approaches. This means that our proposed approximation 
approaches output BBA’s which are most faithful and with 
the least loss of information when compared with other 
approaches. So based on this comprehensive evaluation using 
two criteria including computation time and the closeness to 
the original BBA, our comparative analysis shows that our 
new methods perform better. The iterative version (having 
the smallest average distance) performs better than the batch 
version. 


VI. CONCLUSION 


The degree of non-redundancy of focal elements is defined, 
based on which, two novel BBA approximation methods have 
been proposed in this paper including a batch version and 
an iterative version. Our Monte Carlo simulation results show 
that these new methods can well reduce the computational 
cost when compared with other available approaches; at the 
same time, the approximated BBA’s obtained using our new 
approaches are closer the original BBA in average, which 
represents the less loss of information in the approximation 
procedure. 

In our future work, further theoretical analyses on the 
definition of the focal element non-redundancy or redundancy 
are needed, based on which, we will also attempt to design 
some new types of the focal element redundancy and to make 
additional comparison with the one used in this paper. Besides 
the computation time and the distance of evidence used in this 
paper, we will explore more comprehensive evaluation criteria 
of the BBA approximation approaches, and test other distance 
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measures of evidence [20] in our proposed approaches. This 
is crucial for the design of more effective approximations. 
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Abstract—The transformation of belief function into probabil- 
ity is one of the most important and common ways for decision- 
making under the framework of evidence theory. In this paper, 
we focus on the evaluation of such probability transformations, 
which are crucial for their proper applications and the design of 
new ones. Shannon entropy or probabilistic information content 
(PIC) measure is traditionally used in evaluating probability 
transformations. The transformation having the lowest entropy 
or highest PIC is considered as the best one. This standpoint is 
questioned in this paper by comparing a probability transfor- 
mation based on uncertainty minimization with other available 
probability transformations. It shows experimentally that entropy 
or PIC is not comprehensive to evaluate a probability transforma- 
tion. To make a comprehensive evaluation, some new approaches 
are proposed by the joint use of PIC and the distance of evidence 
according to the value based and the rank based fusion. A 
pattern classification application oriented evaluation approach 
for probability transformations is also proposed. Some desired 
properties for probability transformations are also discussed. 
Experimental results and related analysis are provided to show 
the rationality of the new evaluation approaches. 

Index Terms—Evidence theory, Probability transformation, 
Probabilistic information content, Entropy, Decision-making. 


I. INTRODUCTION 


Dempster-Shafer theory (DST), also known as the theory 
of belief functions [1], [2], provides a way to reason with 
imprecise, uncertain and incomplete information. DST can 
distinguish “unknown” and “imprecision” and provides a 
method to fuse different evidences by using the commutative 
and associative Dempster’s rule of combination. That is why 
the DST is widely used in information fusion. There are, 
however, some drawbacks [3], [4] of the DST, e.g., counter- 
intuitive combination results, high computational cost, and 
lack of evaluation criteria. So some modified models were 
proposed, e.g., the transferable belief model (TBM) [3] and 
Dezert-Smarandache theory (DSmT) [4]. 

The final goal of uncertainty reasoning is usually decision- 
making. To make decision easier, the mass assignment for a 
compound focal element is usually assigned to each singleton 
by a probability transformation. The probability transformation 
aims to approximate a basic belief assignment (BBA) by 
a probabilistic measure. The pursuit of efficient probability 
transformations has attracted great attention in recent years 


and many probability transformations have been proposed [4]- 
[13]. 

The most well-known probability transformation is the pig- 
nistic probability transformation (PPT) [4] in TBM. PPT maps 
a belief defined on subsets to a probability measure defined 
on singletons, based on which a classical decision under 
probabilistic framework can be readily applied. PPT uses equal 
weights when splitting mass assignments of the compound 
focal elements and redistributing them to singletons included 
in them. Other modified probability transformations were also 
proposed [5]-[13], which assign the mass assignments of 
compound focal elements to singletons according to some ratio 
constructed from the available information (e.g. the belief and 
the plausibility). Typical examples include Sudano’s probabili- 
ties [8] and Cuzzolin’s intersection probability [12], etc. Under 
the DSmT framework, other probability transformations called 
DSmP [9] and HDSmP [13] were proposed. DSmP takes into 
account both the masses and the cardinality of focal elements 
in the proportional redistribution process and HDSmP is a 
hierarchical version of DSmP. They can also be used in the 
DST framework. 

To compare all the available transformations for the purpose 
of appropriate application and design of new transforma- 
tions, evaluation is required. In almost all the existing works 
on probability transformations, Shannon entropy or its dual, 
Probabilistic Information Content (PIC) criterion, is used to 
evaluate them. Definitely, less uncertainty should be preferred 
for decision-making. However, is the probability measure 
generated from a belief function with less uncertainty always 
rational or beneficial for decision-making? To answer this, 
i.e., to illustrate the irrationality of the over-emphasis of PIC 
or entropy, another probability transformation based on a 
constrained entropy minimization [14] is used and analyzed 
through examples. When using entropy or PIC for evaluation, 
the probability measure with the least uncertainty seems the 
best one. Unfortunately, some risky and unexpected results 
may be also obtained. [14] shows that either entropy or PIC 
is not a comprehensive measure. 

Comprehensive evaluation of probability transformation is 
desired and motivates this paper. PIC only emphasizes the 
clarity of the transformed probability, which is only from the 
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aspect of the clarity of decision-making. On the other hand, the 
transformed probability should be consistent with the original 
belief function in some sense meaning that comprehensive 
evaluation should also consider the fidelity of the transformed 
probability to the original BBA. Higher degree of fidelity 
means the less loss of information caused by probability 
transformation. Our comprehensive evaluation aims to make 
a balance between clarity and fidelity, ie., a probability 
with higher clarity and bigger fidelity should be preferred. 
Then, how to quantify the degree of fidelity? The distance of 
evidence [15] is used to measure the dissimilarity between two 
BBAs. Since the probability can be considered as a particular 
BBA, we can simply use the distance of evidence [15] between 
the original BBA and the transformed probability to quantify 
the degree of fidelity (smaller distance means higher degree 
of fidelity). So, in this paper we evaluate the probability 
transformations jointly by PIC and the distance of evidence. 
This joint use of the two criteria is implemented by using value 
based fusion (via the values of PIC and distance) and rank 
based fusion (via the ranks of the values of PIC and distance). 
We also propose an application-oriented evaluation approach 
for probability transformations, such as the application of 
pattern classification. Besides the evaluation criteria, some 
desired properties (qualitative evaluations) of probability trans- 
formations are also helpful. In [16], some desired properties 
of probability transformations were proposed and analyzed 
including upper and lower bound consistency and combination 
consistency. In this paper, some new desired properties of 
a probability transformation are also proposed. This paper 
extends our previous ideas briefly introduced in [14], where 
we preliminarily pointed out that entropy or PIC is not enough 
to evaluate a probability transformation. However, in [14], 
comprehensive evaluation was not proposed, which are the 
main contribution of this paper. 

The rest of this paper is organized as follows. In Sec- 
tion II, evidence theory is briefly introduced. The decision- 
making methods in evidence theory including belief based 
approaches and probability transformations are briefly sum- 
marized in Section III. The definitions and pertinent analysis 
of the commonly used probability transformations are given 
in Section IV. In Section V, the evaluation of the probability 
transformation is discussed. The irrationality of using entropy 
alone as an evaluation criterion is clearly shown by simple 
examples. In Section VI, we propose to evaluate a probability 
transformation based on two criteria (PIC and distance). The 
joint use of them is implemented either directly at their values, 
or at their ranks. Some supporting examples are provided in 
Section VII. In Section VII, an application-oriented evaluation 
approach is proposed. In Section IX, some desired properties 
of probability transformations are proposed and analyzed. 
Conclusions are drawn in Section X. 


II. BASICS OF EVIDENCE THEORY 


In Dempster-Shafer theory [2], the elements in the frame 
of discernment (FOD) ©, which is a discrete finite set, are 
mutually exclusive and exhaustive. Let 2° be the power set 


of the FOD. The function m : 2° — [0,1] defines a basic 
belief assignment (BBA), also called a mass function, which 
satisfies: 


Then, the belief function and the plausibility function are 
defined as in (2) and (3), respectively, VA € Pee 


Bel(A) = ee m(B), (2) 


PUA) = DO ape ™): (3) 


where Bel(A) and Pi(A) can be interpreted as the lower and 
the upper bounds of the probability P(A). 
Dempster’s rule of combination, which is used to fuse n 
distinct! bodies of evidence (BOEs), is: 
0, VA=9, 


NA,=Al<i<n 


a Il milAi) ’ 


NA, FO 1<i<n 


where ™1,M2,...,7%n are n BBAs. 

Distances of evidence [15], [17] measures the dissimilarity 
between BOEs. One of the most commonly used distance of 
evidence is the Jousselme’s distance dj(-,-) [15]: 


1 
dj(m4,mz2) = 3 im —my)' Jac (m;—mz), (5) 


where the element J;; = Jac(A;,.B,;) of Jaccard’s weighting 
matrix Jac is defined as: 


[Ain B;| 
Jac(A;, Bj) = (A; UB, |" (6) 

For example, two BBAs are defined over the FOD O = 
{01,2}: 

m1(A1) = 0.2, m1(Az2) = 0.8, 

m2(B1) =0 5, m(Bo) = 0.5. 
where 

A, = {9}, Az = {1,2}, 

By = {62}, By = {61, Ao}. 

We have 

Jac(Ay, By 10|/|{61, O2}| =05 


)= 

Jac(A2, Bi) = |{02}|/|{01, 62}| = 0.5; 

Jac(A1, Bo) = |{01}|/|{61, 62}| = 0.5; 

Jac(A2, Bz) = |{91, 62}|/|{A1, A2}| = 1. 

Although there are other distance definitions for belief 
functions, they either have some limitations or are not strict 
distance metrics [18]. Jousselme’s distance has been proved to 
be a strict distance metric [19]. 

The aim of the evidential reasoning is for decision-making. 
Several decision-making approaches in evidence theory are 
briefly reviewed next. 


III. DECISION-MAKING IN EVIDENCE THEORY 


There are two major types of decision-making approaches 
under the evidence theory framework: directly using belief 
functions [20], [21] and using probability transformations of 
belief functions [22]. 


‘ie., cognitively independent. 
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A. Decision-making using belief functions 


There exist three main decision-making rules using Bel and 
Pil. 

1) Max Bel: One chooses the proposition A with the 
maximum Bel(A). Bel(-) describes the lowest trust degree of 
a given proposition. So it is also called pessimistic decision- 
making in DST [20]. 

2) Max Pl: One chooses the proposition A with the maxi- 
mum P/(A). Pl(-) describes the highest trust degree of a given 
proposition. So it is also called optimistic decision-making in 
DST [20]. 

3) Joint use of Bel and Pl: Bel and Pl measure the degree 
of trust of a given proposition from two points of view. So it 
is not comprehensive to make a decision based on only one of 
them. An extension is the “final belief” defined below [21]. 


FB(A) = Bel(A) + a(Pl(A) — Bel(A)), (7) 


where a = Bel(A)/(Bel(A) + Bel(A)). The proposition A 
with the maximum F'B(A) is preferred. 

Note that the proposition A can be either a singleton or a 
compound proposition (containing more than one singleton). 


B. Decision-making using probability transformations 


Probability-based decision rules are the main stream of 
decision-making based on evidence theory [21], because the 
two-level reasoning and decision structure (i.e., the credal 
and pignistic levels) proposed by Smets in his TBM is quite 
appealing. In this type of decision-making approach, the belief 
function (or BBA, plausibility function) is transformed into a 
probability measure P first and then the decision can be made 
as 0° = arg max P(6;), where 0; is a singleton of the FOD. 


As we will see next, the probability transformation is crucial 
for this type of decision-making. 


IV. PROBABILITY TRANSFORMATIONS 


A probability transformation is a mapping PT : Belo > 
Pre. Bele is a belief function defined on 0 and Pre is 
a probability measure (in fact a probability mass function, 
pmf) defined on ©. Major probability transformations (PTs) 
are summarized below. 

1) Pignistic transformation: The classical pignistic proba- 
bility was proposed in TBM framework [3], which is a subjec- 
tive and non-probabilistic interpretation of evidence theory. At 
the credal level of TBM, beliefs are entertained [3], combined 
and updated. While at the pignistic level, decisions are made 
by applying the pignistic probability transformation (PPT). 

Suppose that FOD is O = {6}, 02,...,0,} in the sequel. The 
PPT [3] for singletons is defined as: 


» 


0,€B, BE2° 


BetP(6;) = ee, (8) 


PPT is designed according to an idea similar to uncertainty 
maximization [14]. In PPT, masses are assigned uniformly to 
different singletons involved. 
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2) Sudano’s probabilities: Sudano [5]-[7] proposed prob- 
ability transformation proportional to plausibilities (PrP1) [5], 
probability transformation proportional to beliefs (PrBel) [5], 
probability transformation proportional to all plausibilities 
(PraPl) [5], hybrid probability transformation (PrHyb) [6], and 
an iterative version of probability transformation (PrScP) [5]. 

They are defined by different types of mappings as follows: 


m(Y) 


PrPl(0;) = PL({0;}) - Se 9) 
- oe Sey PE 
PrBel(6;) = Bel({@;})- ey (10) 


° 6. 
YE2,OieY | gry 


1 — )0; Bel({9;}) 
a5 PUL9;}) 


os 


YE2°,0;EY 


PraPl(6;) = Bel({6;})+ -PI({;}), (AD 


m(Y) 


y: PraPl(@;) 
U,O5=¥ 


PrHyb(6;) = PraPl(6;)- , (12) 


PrScP(6:) = >, -y core -m(Y). (13) 


Note that the iterative PrScP should be initiated by some other 
transformation. 

3) Cobb-Shenoy’s normalization of plausibility: This prob- 
ability transformation is defined as the normalized plausibility 
function of singletons [8]. 


PnP1(6;) = UNE) © (14) 


25 PUL95}) 


4) Cuzzolin’s intersection probability: From a geometric 
interpretation of evidence theory, an intersection probability 
measure [12] was proposed using the proportional repartition 
of the total nonspecific mass (TNSM = 9) 4e90 | 4)51 ™(A)) 
for each contribution of the nonspecific masses involved. 


PU{Oi}) — MG) ; 
5 (PIG }) — m(O;)) oe 


5) DSmP: The DSmP,(0;) [9] can be directly obtained by: 
DSmP-(9;) =m({9i}) + (m({9i}) + €) 


CuzzP(6;) = m({6;}) + 


m(X) 
(2 mY) re 1X] 
Xe2 Yye2° 
9%CX|X122 VEX yja1 


(16) 
In DSmP, both the mass assignments and the cardinality of fo- 
cal elements are used in the proportional redistribution. DSmP 
makes an improvement compared with Sudano’s, Cuzzolin’s 
and BetP, in that DSmP makes a more judicious redistribution 
of the ignorance masses to the singletons involved. DSmP 
works for both DST and DSmT. 
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6) HDSmP: HDSmP is a hierarchical version of DSmP 
(See [13] for details). When the mass for the focal elements 
with the same cardinality are zero, HDSmPp can not be com- 
puted due to its hierarchical nature. Therefore, the parameter 
€ is necessary to improve the applicability of HDSmP [13]. 

7) PrBP1: The proportional transformation hypothesis used 
in PrBP1 [11] assumes that the masses are distributed propor- 
tionally to the product of Bel(@;) and Pl(@;) among each 
singleton element of 6; € Y with Y C O. 


PrBP1(6;) = 5> 2 SENG) 2 -m(Y). 
¥,0€Y \ 05 o,ey Bel(O;)P1(G;) 

(17) 

S) PrBP2: The PrBP2 [11] assumes that the masses are 

disenbuted Pope to some given parameters s; = 


Bel(6; 


T—PI(0;) OF $i = 


1—Bel(6;)° 


Si 
Y,0iEY 3 5,0;EY $j 


A probability transformation outputs a Bayesian BBA (having 
only singleton focal elements) corresponding to a given (non- 
Bayesian) BBA. That is why the probability transformation is 
also called the Bayesian transformation. A Bayesian BBA is 
not a probability measure, but if m(.) is a Bayesian BBA, then 
its corresponding Bel(.) and Pl(.) coincide with a probability 
measure, i.e., Bel(.) = Pl(.) = P(.). Due to the tradition, it 
is still called the “probability transformation” in this paper. 


PrBP2(0;) = 5° -m(Y). (18) 


V. QUESTIONING OF TRADITIONAL EVALUATION OF 
PROBABILITY TRANSFORMATION 


The evaluation of different probability transformations is 
important for analysis and their applications. It is also impor- 
tant for the design of new transformations. In this section, 
we will provide some comments on traditional evaluation 
approaches for probability transformations. 


A. Traditional Evaluation approaches for probability transfor- 
mation 


Qualitative evaluation approaches were proposed. In [13], 
three desired properties of a probability transformation are 
introduced, including: 

1) p-consistency: A probability transformation PT is p- 
consistent (probability consistent) if PT(m) = m for any 
Bayesian BBA ™. 

2) ULB-consistency: A probability transformation is ULB- 
consistent (upper and lower bound consistent) if its resulting 
transformed probability P = PT(m) satisfies Bel(X) < 
P(X) < PIX). 

3) Combination-consistency: The combination-consistency 
means that we will obtain the same result either, if we combine 
two BBAs m, and mg using the combination rule first and 
perform probability transformation thereafter, or perform prob- 
ability transformation to both input BBAs m, and mz first and 
combine them thereafter. It is defined through commutation 
property of combination rule and probability transformation. 
It is difficult to be satisfied, and PnPI [8] is the only one known 


to the authors that can satisfy it when using Dempster’s rule 
of combination. 

There also exist some quantitative metrics measuring the 
strength of a critical decision based on a probability measure: 

1) Normalized Shannon Entropy: Suppose that P(@) is a 
pmf, where 0 € 8, |O| = N. An evaluation criterion for the 
pmf transformed is as follows [11]: 

— DF P(8) logs(P()) 
i ns 
logs N 
i.e., the ratio of Shannon entropy [23] and the maximum Shan- 
non entropy. Clearly Ey is normalized. The larger (smaller) 
Ey gets, the larger (smaller) the degree of uncertainty gets. 
When Ep= 0, one singleton proposition will have probability 1 
and the others will have zero probabilities. Therefore, the agent 
or system can make decision without error if the probability 
P(-) corresponds to the real probability of the events. When 
En= 1, it is unlikely to make a correct decision, because P(@) 
are equal, for all 0 € O, i.e., one has a uniform pmf. 

2) Probabilistic Information Content: The Probabilistic 
Information Content (PIC) criterion [5] is the dual of the 
normalized Shannon entropy. The PIC value of a pmf obtained 
from a probability transformation indicates the total knowledge 
to make a correct decision: 

1 
logs N 


(19) 


1+ 


PIC(P) = S © P(A) logs(P(9)). (20) 


0cO 

Obviously, PIC = 1 — Eq. A PIC value of zero indicates that 
the knowledge to make a correct decision is not informative 
enough (all propositions have equal probabilities, i.e., one has 
the maximal entropy). 

Less uncertainty means that the corresponding probability 
transformation result is more helpful in making a decision. 
According to such an idea, the probability transformation 
should attempt to enlarge the belief differences among all 
the propositions and thus to achieve a clearer decision result. 
Is this rational? Is uncertainty degree always judicious at all 
to evaluate a probability transformation for decision-making 
purpose? If this is true, a probability transformation approach 
based on direct uncertainty minimization should be the best 
choice. Is that true? In the next section, we examine the 
legitimacy of using uncertainty degree as a criterion to evaluate 
a probability transformation. 


B. Probability transformation based on uncertainty minimiza- 
tion 

As mentioned above, the “best” probability transformation 
can be obtained by directly minimizing Ey (or equivalently 
maximizing PIC) as follows. 


(pQiece) {- is PO) loss( PCO) f 


P(e) < PB), = 
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where the objective function is the Shannon entropy and 
the constraints are the ulb-consistency and the property of 
probability. Given a belief function, the solution of (21) is 
guaranteed to have the least uncertainty and is thus seemingly 
more preferable in decision-making. This so called the “best” 
transformation is denoted by Unmin. 

Clearly, the problem of finding a minimum-entropy pmf 
does not have a unique solution in general. We use the Quasi- 
Newton method followed by a global optimization algorithm 
[24] to solve (21) to alleviate the effect of the local extremum 
problem. Other optimization algorithms [25], [26] can also 
be used, e.g., Genetic Algorithm (GA) and Particle Swarm 
Optimization (PSO). 


C. Analysis of probability transformation based on uncer- 
tainty minimization 
To compare different probability transformations, the follow- 
ing two examples drawn from [6] and [11] are considered, 
where PIC is used for evaluation. 

Example 1: For FOD © = {61, 62, 43, 04}, the correspond- 
ing BBA is as follows: 


{6;}) = 0.16, m({62}) = 0.14, m({O3}) = 0.01, 
{64}) = 0.02, m({@1, 62}) = 0.20, 

{61, 63}) = 0.09, m({61, 64}) = 0.04, 

{62, 63}) _ 0.04, m({6, 64}) — 0.02, 

{63, 64}) = 0.01, m({61, 62, O3}) = 0.10, 
{61,02,04}) = 0.03, m({1, 03, 04}) = 0.03, 
{6,03,04}) = 0.03, m(@) = 0.08. 


Se Sees 


TABLE I 
PROBABILITY TRANSFORMATION RESULTS FOR EXAMPLE 1. 


[ala ]% | % | 7 


Based on the probability transformations defined in (8)- 
(18) and (21), respectively, the BBA can be transformed 
into different probabilities as illustrated in Table I. Their 
corresponding PIC’s can be calculated using (20), which are 
also listed in Table I. The Un,,;, provides the maximum PIC 
as expected. 


Example 2: For FOD © = {61, 62, 63, 64}, the correspond- 
ing BBA is as follows: 


m({61}) = 0.05, m({02}) = 0.00, m({3}) = 0.00, 
({81}) = 0.00, m({1, 02}) = 0.39, 

({01, 03}) = 0.19, m({@1, 04}) = 0.18, 

({42, 03}) = 0.04, m({62, O4}) = 0.02, 

({83, A4}) = 0.01, m({41, A2, 63}) = 0.04, 

({61, 02, 64}) = 0.02, m({1, 3, 64}) — 0.03, 

({O2, 03, 04}) = 0.03, m(O) = 0.00. 


SS SSS 


By using the probability transformations defined in (8) — (18) 
and (21), respectively, we can transform the BBA into different 
probabilities as illustrated in Table I. Their corresponding 
PIC’s can be calculated using (20), which are also listed in 
Table II. In this example, the masses for some singletons are 
zero, SO some probability transformations can not be applied 
as shown in Table H, where N/A means ”Not available”. 
The notation DSmPp, DSmPo.991, HDSmPp and HDSmPp9 901 
mean that the values of the parameter ¢ in DSmP and HDSmP 
transformations are chosen to 0 and 0.001. 

From the experimental results in Tables I and IJ, the pmf 
obtained from the proposed Uni, approach has the least 
uncertainty (and thus the greatest PIC) when compared with 
the others. That is, the difference among all propositions of the 
existing probability transformation approaches can be further 
enlarged, which is seemingly helpful for more consolidated 
and clearer decision-making. 


TABLE II 
PROBABILITY TRANSFORMATION RESULTS FOR EXAMPLE 2. 


PrBel | N/A due to 0 value of singletons 

PrScP | N/A due to 0 value of singletons 

PrBP1 N/A due to 0 value of singletons 
DSmPo | N/A due to 0 value of singletons 
HDSmPo | N/A due to 0 value of singletons 

Pat oeao | 27s [-os61 | 0.1381 [ o0om 

BaP J 04600 [0.2550 | 0.1535 [ 0.1317 | 0.910 


Remark: However, there exist serious deficiencies associ- 
ated with Un,»in, as shown in the following example. 

Example 3: The BBA defined on FOD 0 = {6;, 02} is: 

m({0,}) = 0.3, m({02}) = 0.1, m({O1, 62}) = 0.6. 

The experimental results of different approaches are listed 
in Table II. Is the probability transformation based on PIC 
maximization (i.e., entropy minimization) rational? 


593 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


TABLE III 
PROBABILITY TRANSFORMATION RESULTS FOR EXAMPLE 3. 


La | | 7c 
PnP! | 0.5625 | 0.4375 | 0.0113 
CuzzP || 0.6000 | 0.4000 | 0.0291 
BetP | 0.6000 | 0.4000 | 0.0291 
PrPl | 0.6375 | 0.3625 | 0.0553 
PraPl | 0.6375 | 0.3625 | 0.0553 
PrHyb |} 0.6825 | 0.3175 | 0.0984 
DSmPo |] 0.7500 | 0.2500 | 0.1887 

DSmPo.001_ |} 0.7493 | 0.2507 | 0.1875 

HDSmPo _ |[ 0.7500 | 0.2500 | 0.1887 

HDSmPo.001 || 0.7485 | 0.2515 | 0.1864 
PrBel | 0.7500 | 0.2500 | 0.1887 
PrScP__{{_0.7500 | 0.2500 | 0.1887 
PrBP1 | 0.7765 | 0.2235 | 0.2334 
PrBP2 || 0.8400 | 0.1600 | 0.3657 
Unwin 09000 | 1000 [ 05310 


In this simple example, all of the mass 0.6 committed to 
{61,42} is only redistributed to the singleton {0,} when the 
Un, yin transformation is used in order to achieve the maximum 
PIC. 


It can be also shown that for the Un,,;n, the mass assign- 
ments m({01,62}) > 0 is always completely redistributed to 
{61} as long as m({61}) > m({62}) in order to achieve the 
maximum PIC. 


This is also true in the situations where the difference 
between masses of singletons is very small as demonstrated 
by the following BBA defined on FOD 0 = {64, 02}: 

m({0,}) = 0.1000001, m({G2}) = 0.1, m({61,62}) = 
0.7999999. 


Such a case shows that m({@:}) is almost the same as 
m({02}) and there is no specific reason to obtain a very 
high probability for 6, and a small one for 02. Therefore, 
the decision based on the result from Un,,;,, transformation 
appears to be very risky or dogmatic. In some applications, 
a decision has to be made and we cannot avoid to make one 
(good or bad). However, when the time to make a decision is 
not too limited or rejection decision is permitted, it is better to 
collect more observations (information) or to make a rejection 
rather than to take high risk to make an erroneous decision. So 
the criterion of uncertainty minimization, which can bring such 
risky results, is not always judicious to evaluate a probabil- 
ity transformation for decision-making purpose. Furthermore, 
when we use Un,;n, there are also some other problems. See 
the next example for details. 


Example 4: The BBA defined on the FOD © = {04, 62, 3} 
is: 

m({O1, 42}) = m({G2, O3}) = m({O1, O3}) = 1/8. 

Using Unmin, we can obtain six different pmf’s yielding 


the same minimal entropy, which are listed as follows: 


P({1}) = 1/3, P({02}) = 2/3, P({03}) = 0; 


P({01}) = 1/3, P({82}) = 0, P({03}) = 2/3; 
P({0:}) = 0, P({O2}) = 1/3, P({03}) = 2/3; 
P({01}) = 0, P({92}) = 2/3, P({03}) = 1/3; 
P({01}) = 2/3, P({92}) = 1/3, P({03}) = 0; 


P({O1}) = 2/3, P({92}) = 0, P({A3}) = 1/3. 


It is clear that the problem of finding a pmf with the minimal 
entropy does not have a unique solution in general. Then how 
to choose a unique one? In this example, different admissible 
pmf yields different decision result. This is a serious problem 
for decision-making. 

From all the above examples, we conclude that the max- 
imization of PIC (or minimization of Shannon entropy) is 
not satisfactory for evaluation. Therefore, more comprehen- 
sive evaluation methods are needed, which is an open and 
challenging problem. 


VI. A NEW BI-CRITERIA SOLUTION FOR PROBABILITY 
TRANSFORMATION EVALUATION 


To design a single criterion for the comprehensive evalua- 
tion of probability transformations is always difficult. Jointly 
using multiple criteria is one option meaning that besides 
entropy or PIC, another measure, which describes some other 
aspects of a probability transformation, may be incorporated 
for evaluation. 

The level of PIC characterizes the clarity of a given pmf. 
Indeed, higher PIC (lower entropy) means that the pmf tends 
to concentrate on a specific hypothesis of the FOD, which 
makes the decision easier for the decider. Also, the pmf 
is transformed from a given BBA representing the original 
information source. If the obtained pmf is in some sense closer 
to the original BBA, it will be preferred. This is because such 
a pmf has high degree of fidelity to the original BBA (..e., 
with less loss of information in the transformation). The clarity 
and fidelity can always be balanced. So by adding one more 
criterion representing the degree of fidelity to the evaluation 
measure, it will be more comprehensive. How to characterize 
the closeness between the obtained pmf and the original BBA? 
The answer is to use the distance (or dissimilarity) of evidence 
[15]. In summary, we attempt to propose bi-criteria evaluation 
approaches by jointly using the distance of evidence and PIC. 

By treating the pmf obtained from some probability trans- 
formation as a special BBA (not strict), a distance of evidence 
can be used to describe the dissimilarity between the pmf and 
the original BBA. We use the distance of evidence together 
with PIC (or entropy) as the elements of a two-tuple: 


(PIC(P),d;(P,m)), or (Entropy(P),d ;(P,m)), (22) 


where P is the transformed probability (i.e., pmf) and m is 
the original BBA. 

A two-tuple can provide more comprehensive information; 
however, how to jointly use them to evaluate a probability 
transformation? Larger PIC represents greater clarity and 
smaller dj represents greater fidelity. Over-emphasizing on 
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any single criterion is not preferred. As aforementioned, if we 
want to choose a better probability transformation approach, 
there should be a tradeoff between the two criteria. Here, we 
propose two approaches to jointly use the two criteria. The first 
one is directly based on the arithmetic operations over PIC and 
dz. The second one first sorts probability transformations to 
be evaluated according to PIC and d,, respectively, to obtain 
two corresponding ranks. And then, the two ranks are fused 
through some rank fusion rule to obtain a comprehensive rank 
for evaluation. 


A. Comprehensive evaluation with value based fusion 


Suppose that there are N pmf’s denoted by P,, P2,..., Pn, 
which are obtained from N different probability transforma- 
tions. Their entropies are En,, F'ng,..., ny and their dz’s 
(between each pmf and the original BBA) are dj, d2,..., dy. To 
jointly use these two criteria, we calculate the comprehensive 
scores for different probability transformations as follows. 


Cyoint (Pi) =a: Ent' (i) + (1 = a) d'(i), (23) 


where 2 = 1,...,N and a denotes a weight representing the 
degree of preference on entropy. Distances (and entropies) 
usually take different values depending on the probability 
transformation. Therefore, to be consistent, we need to first 
normalize them by their ranges as: 


Ent! (i) = set Bm (24) 
d (i) = Oe. 

where the vector d = [d(1),d(2),...,d(N)] and the vec- 
tor Ent = [Ent(1), Ent(2),...,Ent(N)] are the vectors 
of distances and entropies corresponding to the probability 
transformations P,;, P2,..., Pn. 

Smaller entropy (larger PIC, i.e., bigger clarity) and smaller 
distance (bigger fidelity) are desired. Then, by sorting the 
values of Cyoint (Pi) in ascending order, we can obtain the 
rank as: 


Ao = (ro(P1),rco(Po), ..TC(Pn)). 


The probability transformations with the best rank, i.e., those 
having the smallest rank value, are preferred. 


(25) 


B. Comprehensive evaluation with rank based fusion 


Let us consider NV pmf’s denoted by P,, Po,..., Pn, which 
are obtained from N different probability transformations. 
A comprehensive rank can then be obtained from the rank 
based fusion (rank fusion for short) [27], [28] implemented in 
following steps. 

e Step 1: Obtain the PIC-based rank 

Sort the pmf’s in descending order according to their PICs 
(this is because higher PIC value is desired). Then the 
rank of all the pmf’s is: 

Apic = (rperc(P1), rpic(P2),-,rpro(Pn)). (26) 


e Step 2: Obtain the distance-based rank 


Sort the pmf’s in ascending order according to the 
Jousselme distances (dz) (this is because smaller dj is 
desired), then the rank for all the pmf’s can be obtained 
as: 


Ag= (ra(P1), ra(P2), .--) Ta( Pn )). (27) 


e Step 3: Obtain the global rank by a rank fusion 
To find the joint (or comprehensive) rank of Aprc and 
Aq, a rank fusion is applied as: 


Ay = f(Aprco, Aa), (28) 
where f is a rank fusion rule and Af is: 
Af= (r;(P1), 7; (P2), «77 (Pw). (29) 


The probability transformations with the best rank, i.e., 
those having the smallest rank, are preferred. 
The selection of rank fusion rule is crucial, which is 
discussed below. 


1) Min rule: 

rj (Pe) = min(rprc(Px),ra(Px)), Vk =1,...,N; (30) 
2) Max rule: 

7; (Pe) = max(rpro(Pr), ra(Pr)), Vk =1,..,N; GBD 


3) Arithmetic averaging rule: 
rj(Pe) = wi -rpro(Pe) + we + ra( Pe), Vk =1,...,N, (32) 


where w 1, wz are weights for the two different ranks to be 
fused. 
4) Optimization rule: The optimization based rank fusion 


rule is: 
L 


A= arg min ye d,(A, Aj), 


j=l 


(33) 


where Aj,..., A, are the L different ranks to fuse and d,.(-, -) 
is a distance between two ranks that will be presented in the 
sequel. 

Here we have two ranks Apc and Ag. The above equation 
could be rewritten as: 

Ay= argmin 5 (d,(A, Aprc) + d,(A, Aa)] - (34) 

d,.(-,-) could be any rank distance including the footrule 
distance [29], the Kendall distance [30], [31] and the Spearman 
distance [32] as introduced below. 

Suppose that A,, Ag are two ranks. Let X = 
{%1,%2,...,0n} be a set of items to be ranked. Aj(i) is 


the rank associated with the item x;, where 7 = 1,2 and 
$A 5 Decdsc ING 
1) Footrule distance: 
N 
F(Ai, Ao) = 90 |Aa(é) — Aa(®)]. (35) 
i=1 
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2) Kendall distance: 
K(A1,A2)= $2 K7j(A1, Aa), 
{i,j}EP 


where P is the set of the unordered pairs of distinct items in 
X and 


(36) 


O, if a; and «a; are in the 
same orderin A; and Ag, 

Kj ;(A1, Ao) = 
1, if a and 2; are in the 


reverse orderin A, and Ag. 


3) Spearman distance: 


ata ho) 1 — 8 Deer As) = Aa)? 


37 

N(N? — 1) G7) 

Clearly, p € [—1,1]. p = 1 means a total positive correlation 
between the ranks and p = —1 means a total negative one. 


Here A; and Az could be Apyco and Ag, respectively. 


C. An illustrative example of rank fusion 

Suppose that X = {2x1,22,23,24} corresponds to a set 
of possible choices in a decision-making problem. A, = 
(1, 2,3, 4] and Az = [1,3,4, 2] are two ranks provided by two 
experts for X. 

When using different rank fusion rules, the results (denoted 
by Af) are as follows. 

1) Min rule: 

Ay = [min(1, 1), min(2, 3), min(3, 4), min(4, 2)] 
= [1, 2,3, 2]. 
This rule has a tie. 

2) Max rule: 

Ay = [max(1, 1), max(2, 3), max(3, 4), max(4, 2)] 
= [1,3,4, 4]. 
This rule also has a tie. 
3) Arithmetic averaging rule: 
Ay =[0.5-14+0.5-1,0.5-340.5-2,0.5-3 
+0.5-4,0.5-4+0.5- 2] 
= [1, 2.5, 3.5, 3]. 
As we can see that both weights are equal to 0.5. 

We encounter the non-integer rank value. This doesn’t 
matter. What we care is only the relative value of a rank. 
Therefore, we obtain the final result as [1, 2,4, 3], which is the 
rank obtained by ordering [1, 2.5, 3.5, 3] in ascending order. 

4) Optimization rule: Here we use footrule distance. Sup- 
pose that Ay = [r¢(1),r¢(2), r7(3), r¢(4)]. We try to find a 
Ay which minimizes 


) 
+ Ir7(3) — 3] + Iry 
+ [r¢(Q1) 
+ |r¢(3) — 4] + [rp (4) — 2]. 
By using the optimization rank fusion rule in (33), one gets 
AP = [1, 2,4, 3]. 


VII. EXPERIMENTS FOR THE BI-CRITERIA EVALUATION 
APPROACH 


In this section, we examine the previous Examples 1-3 
using the new bi-criteria evaluation approach. 


A. Example I revisited 


Table IV shows the evaluation results of different probability 
transformations (the initial pmf for PrScP used here is BetP), 
their distances and PIC’s, their ranks obtained using two 
criteria (Apyc and Aq), and the joint rank using value based 
fusion (Avalue). Table IV also provides the evaluation results 
using rank fusion, where A,,,;7, denotes the fused rank using 
min rule; Aja denotes the fused rank using max rule; Ague 
denotes the fused rank using arithmetic averaging rule; Aopt 
denotes the fused rank using optimization. The weight for 
value based fusion is a = 0.5 while the weights for arithmetic 
averaging rank fusion are wy = w2 = 0.5. In optimization- 
based rank fusion, the distance used is the Spearman distance 
in (37) due to its quadric form, which is mathematically more 
tractable for optimization. The comparisons among evaluation 
results of different criteria are also shown in Fig. 1. Note that 
in all experiments here, smaller value of rank represents higher 
rank. 

In Table IV there exist cases of tie. Our strategy for the 
tie is as follows. When a tie happens, the alternatives in the 
tie will be assigned the same rank. The rank of the closest 
following alternative of the tie will be increased by the number 
of alternatives in the tie. For example, in Table IV, the PIC’s of 
PrBel and DSmPp are the same, so their ranks are both 6. The 
PIC value of HDSmPpo 001 is the closest following alternative, 
so its rank becomes 8. That is, there is no rank 7 here because 
rank 6 appeared twice. 

From Table IV, although the pmf obtained from Un,,;,, has 
the maximum PIC (thus it seems to be the best choice), it 
also has the maximum d_;. Therefore, it is not the best choice 
according to d. The joint evaluation results show that Unpin 
is not preferred. The bi-criteria evaluation appears more natural 
and helpful than PIC alone. 

Also, PnPl has the minimum d,. Thus according to Ag, it 
is the best. But PnPl has the lowest PIC which is not good 
for making a clear or solid decision. From this angle, it is the 
worst. As we can see, the evaluation based on PIC or dz alone 
is not satisfactory. In Table IV, PnPl has obtained the worst 
score based our proposed bi-criteria evaluation approach. This 
new evaluation approach can assure a “good” score for both 
elements of the two-tuple and meanwhile it can also counteract 
the exaggeration of a single factor. 

It can be seen from the experimental results listed in 
Table IV and Fig. 1, although there are some differences 
among different evaluation approaches, BetP, PraPl, DSmP and 
HDSmP all perform pretty well in this example, which make 
a good balance between PIC and d,, i.e., the clarity and the 
fidelity. 
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EVALUATIONS OF PROBABILITY TRANSFORMATION RESULTS OF BBA IN EXAMPLE | USING DIFFERENT CRITERIA. 


TABLE IV 


| Pic | dy | Apic | Aa | Avaiue | Amin | Amar | Aave 


par oose [oases [a] is | 7 | 4 | 15 | 
Guar [oor [o2sr| a [3s > m | s | 2 | a _ 
pe || 006 | o2a2 | 3 [1] | 1 | 0 [1] 
Part || o1007 [ome |p [2] 2 [3 | s [7 
priors | oa | [sf ou | 9 | « | 5, 
Perigo [ozo | o2se9 | 0 [6 | [am | 5 | s_ 
PeBel_ J osi00 | owes 6 [9 | «6 | | 2 | 3 | 
psnp I osi00 [ora | 6 fo | «6 ||? | 3_ 
DSmPo.001 [0.3058 | 0.2801 | 9 | 7{[ 8 | 4 [ 2 | 5 | 
HSMP) || ost61 | o2a7 [| 5 [uw] s [9 | « | s_| 
AIDSmPo.onr I] 03064 [02807 |e | 8] 9 | sp 1 ps 
per _fpoxaa7 [ors | 4 [a] 4 [7 | «| 3] 
perros [oms7 [3 [wf 1 | s | | 3] 
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11 ~| i Prt 
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4b || HEGHDSmP, 994 
3r | HE) PrScP 
ol |) I PrBP1 
qf |) I PrBP2 
0 HB Un_min 


Distance Value 


Fig. 1. 


B. Example 2 revisited 


The parameters are the same as in revisited Example |. The 
results are shown in Table V and Fig. 2. 

In this experiment, we can see that four probability transfor- 
mations (PrBel, PrBP!, HDSmPp, and DSmPo) cannot provide 
results due to the zero value for singletons. This shows that 
they are not robust and have more requirements for the original 
BBA. DSmP and HDSmP have the parameter which can 
counteract this negative effect. 

From the comparisons among the remaining eleven proba- 
bility transformations in Table V, we see that although there 
are some differences among different evaluation approaches, 
those exaggerated transformations e.g., PnP] and Un,,;n, over- 
emphasizing only one criterion, do not perform that well. The 
bi-criteria evaluation appears more natural and helpful than 
using PIC alone. 


C. Example 3 revisited 
The results are shown in Table VI and Fig. 3. 


Max Average Opt 


Evaluation results for Example 1 using different criteria. 


In this experiment, those transformations (PrPl, PrHyb, 
PraPl, DSmPo.001, HDSmPo.901) making a good balance be- 
tween clarity and fidelity always perform well using different 
evaluation approaches. 

The above results show that DSmP and HDSmP can always 
generate a probability measure with less uncertainty. At the 
same time, this is not too risky, i.e., they can achieve a better 
tradeoff between PIC and risk in decision-making. 

We prefer the evaluation approaches using rank fusion when 
compared with that using value based fusion. This is because 
in the evaluation approaches using rank fusion, the values are 
not that important. The rank fusion is not sensitive to the 
ranges of different criteria. In this sense, it is relatively more 
robust. Although in the evaluation approaches using value 
based fusion, we added the step of normalization to counteract 
the sensitivity to the value ranges, it cannot be avoided but 
suppressed to some extent. 

Among all the evaluation approaches using rank fusion, we 
prefer the arithmetic averaging and the optimization based 
ones. This is because they are moderate, i.e., neither too 
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pessimistic (like min rule) nor too optimistic (like max rule). 


VIII. APPLICATION-ORIENTED EVALUATION OF 
PROBABILITY TRANSFORMATIONS 


The performance evaluation approaches for decision- 
making applications such as classification, are available. 
Therefore, we can use the evaluation of decision-making 
under the evidence theory framework to indirectly evaluate 
probability transformations, which is illustrated in Example 5. 

Example 5: In this example, a pattern classification ap- 
plication is considered. We consider only three classes of 
samples in this example, which are illustrated in Fig. 4. The 
2D-dataset is artificially generated. Samples of each class are 
uniformly distributed around three different centers. Abscissa 
and ordinate in Fig. 5 represent two feature dimensions of 
each sample. 

The classifier used in this example is the / -nearest neighbor 
(i<-NN) [33]. For each test sample, the output of the classifier 
is represented by a BBA. The corresponding BBA for each test 
sample is generated as follows. 

a) The class space is C = {1,2,3}. For a test sample, find 
its K nearest neighbors. In the nearest neighbors, calculate 
the ratio of the samples belonging to each class as follows: 

Oe (38) 
> kU) 
j=l 
where P(i) represents the ratio of class i and k(i) represents 
the number of samples belonging to class 2 in the AK nearest 
neighbors, i = 1, 2,3.. Obviously, K = ey k(j). 

b) For the two classes s and t (s,t € 1,2,3,s 4 t) with 
the top two values of k(i),i = 1, 2,3, the corresponding mass 
assignments are generated according to [34]: 


m({i}) = P(t), Vi = s,t. (39) 
The remaining mass is assigned to the total set O: 
m(O) = 1— m({s}) — m({t}). (40) 


For example, for a test sample x ,, among its 7 nearest 
neighbors, 4 belong to class 1, 2 belong to class 2, and one 
belongs to class 3. The class distribution is then P(1) = 
4/7, P(2) = 2/7, P(3) = 1/7. The dominant class is class 
1 and class 2 is at the second place. The corresponding BBA 
is m({1}) = 4/7, m({2}) = 2/7 and m({1, 2,3}) = 1/7. 

There are 200 samples for each one class with a total of 600 
samples. In each experiment cycle, the samples are randomly 
selected from each class with 100 samples for training (300 
training samples in total) and the remaining samples are used 
for testing (300 test samples in total). 

For a probability transformation PT’, the decision result will 
be class 21 if 


i, = argmax PT(j),i2 = arg max PT(j), (41) 
j 


IF UA 


PT (i1) — PT (ia) 2 7, (42) 


where 7 is the threshold for decision-making. If (42) is not 
satisfied, 2; will be rejected. The threshold 7 is selected from 


{0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 0.8, 0.9, 1.0}. 


For each of these tested thresholds, the above experiment 
procedure is repeated 100 times to calculate the average perfor- 
mance of different probability transformations”. Based on the 
simulation results, the average rejection rates (average over all 
threshold values) of different probability transformations are 
shown in Fig. 5. 

As we can see from Fig. 5, the rejection rate of Unmin is 
the minimum one. This is because Un,,;n emphasizes on the 
clarity. Therefore, the difference between different alternatives 
is relatively large. We propose to use “rejection-error” curve 
as shown in Fig. 6 to evaluate probability transformations. 


The abscissas and ordinates of the points on rejection-error 
curves are respectively the average rejection rate values and the 
average error rate values for different probability transforma- 
tions at each threshold 7. For any probability transformation at 
each threshold value, the average rejection rate and the average 
error rate are the mean of the repeated 100 times simulations. 
The deviations of the rejection rate and error rate of each 
probability transformation at different thresholds are listed in 
Tables VII and VIII, respectively. 

From the results in Fig. 6, it can be seen that given the 
same rejection rate, the classification error rate of Unin 1s 
always the highest. The decision results based on Un» jin are 
the worst. Although Un,,in has the least uncertainty degree 
and the minimum rejection rate, it is not the winner. The 
smaller rejection rate is at the price of higher classification 
error rate. The rejection-error curves can be used as a compre- 
hensive and indirect application-oriented evaluation approach 
for probability transformations. 

The performance of other probability transformations (ex- 
cept for Un in) is similar when using application-oriented 
performance evaluation (rejection-error curves). 


IX. DESIRED PROPERTIES OF PROBABILITY 
TRANSFORMATIONS 


In this section, we discuss some desired properties of 
probability transformations. 


A. Order preservation 


It is preferred that a probability transformation can maintain 
some order before and after the transformation, e.g., the order 
of the uncertainty degree. Given N BBAs: m},..., mn, We can 
obtain the rank or order of their uncertainty degree according 


2The probability transformations HDSmPo, PrBP1, PrBP2 are not included 
in our simulation because they cannot be computed when zero masses occur. 
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EVALUATIONS OF PROBABILITY TRANSFORMATION RESULTS OF BBA IN EXAMPLE 2 USING DIFFERENT CRITERIA. 


TABLE V 


Nave A 


opt 


PrBel | N/A due to 0 value of singletons 
DSmPo | N/A due to 0 value of singletons 
HDSmPo | N/A due to 0 value of singletons 
PrBP1 N/A due to 0 value of singletons 
proms ose] nm [a] 9 | 7 [ww] qn 
GunP || oom [oss | [sf 7 [3 | « | [a 
pa? [ooo osm fs fry s | 1 | 4 ~1 [oa 
part || oso [osm | 9 [2[« [3 | « | 2 | 2 
prt _fpo2en [osm | 7 fs | 1 | 9 [2-3 | 3 
Parib |[ o208 [osu |—s_|7 > 3 [9 | 2 [3 [3 
DimPo.o; JPost [oases 2 Pio | 3 | 8 [3 [3 
HDSmPo oor [ostos [oas72[ 3 | 9 | 1 [| s | « | 3 | 3 
psc [ose [oa | a fe] 4 [7 [4 ]s3 3 
pear? || oar [osm | 6 | «| 2 | [1 [3 | 3 
Unnin Jovan [owes | 1 [fe [1 [wo | 3 | 3 
Bi-criteria Fusion 
al ) || i pnPt 
10/ | Hiicuzze 
gh ‘|| if Bete 
gl | Hiljprar' 
7 | err 
5 6+ | [erHyb 
© 5 | | 4] PSP 5 001 
4- | HOSP, 501 
3p | ij erscP 
2r | i ersp2 
al | HH Un_min 


PIC Distance Value Min Max Average Opt 


Fig. 2. Evaluation results for Example 2 using different criteria. 


TABLE VI 
EVALUATION OF PROBABILITY TRANSFORMATION RESULTS OF BBA IN EXAMPLE 3 USING DIFFERENT CRITERIA. 


J Pic | ds | Apro | Aa | Avatue | Amin | Amas | A 


ave 


pa poo foasl = fs] 6 | s | [a | 
Guat loo [os00[ is _ [1 fm 1 | [1 
a 
part [| oosss [osm [ mf sf 2 [5 | s | 1] 
pe oosss [osm 1 [sf 2 7s | 8 [7] 
Perio [Looms [osm | 0 [of 1 | 3] 3 | 9] 
Pabet ors? [oxsa[_s_[w] 7 | 9 | 3 | 3] 
psnPo [orss7 [oxssa|_s [ww] 7 | 9 | 3 | 3] 
DanPowr pow oss fs fe, s |» | 1 | 9, 
HDSmPo fp oise7 fossa | 5 pw] 7 |» [> [3 
AIDSnPo wn [[o0ss3 [03023 9 [7] 4 | «| > | 9 _| 
pase [Loser [oases pw {7 [9 | 3 | 3 _ 
pep? [oasa[omsi | 3 [| | s | | 9 | 
parr ose? [ose | 2 [wf om | «| o | 9 | 
Unnin J ossio | oa | 1 [as | | 1 | [9 
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Bi-Criteria Fusion 
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Fig. 3. Evaluation results for Example 3 using different criteria. 


TABLE VII 
DEVIATIONS OF THE REJECTION RATE OF EACH PROBABILITY TRANSFORMATION AT DIFFERENT T. 


HDSmPo.001 


F [BaP | Una [ DSmPoon | PoP] DSmPo | PPL] Piel | PaPl | HOPI | Con? | 


mPofof o [oto fofofolfo]fo_ 
0.1 [-onaoe [ oors0 [ose amass | o7360 | oowos [00360 | ooan6 | O.osea | O0A06 | 
02 [oor [ 00332 | oor o0ss6 | oot | oot [oom | oti | oar | ooa0r | 
03 [00374 [ 0086 [otis oosi2 | cose | 00811 [ooais | ooai | oats [00374 
04 [002s [ 0085s | 0.030 [ o02se | 0.0330 | o.0309 [ 00330 | 0.0309 [0.0308 [00276 | 
05 [00265 [ o0s20 | 0.082 [ om26r | o0ase | ome [ 02s | orra [0.0282 | 00265 | 
06 [0009 [ oa2a7 [oma [aoie | oat | o0ane [oar | ore | 0.0n14 [00200 
07 | ooaos [ o02is [oars [ona0e | 00213 | o0ars | o0ai | oozrs [0.0213 | 00208 
ox [Loorss [ o0i92 | o.o1 [ooiss | 00192 | oois2 [00192 | a0192 | 0.0192 | 00185 
09 [0016s [ ooi67 | ois? f ooi6s | ooi67 | oor? | 00167 | oo167 | 0017 | o0165 | 


0.0406 
0.0412 
0.0387 
0.0298 
0.0274 
0.0209 
0.0208 
0.0195 
0.0165 
0.0131 
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TABLE VIII 
DEVIATIONS OF THE ERROR RATE OF EACH PROBABILITY TRANSFORMATION AT DIFFERENT T. 


[_BetP | Unvnin | DSmPo.oo: | PaPl | DSmPy | PrPl_| PrBel | PraPl_| HybPl 
|| 0.0204 0.0204 0.0204 0.0204 | 0.0204 | 0.0204 | 0.0204 | 0.0204 | 0.0204 | 0.0204 


[0.0099-[0.0236 [0.01 [0.0071 | 0.0144 | O.OTIS | 0.0144 | 0.0118 | 0.0118 | 0.0094 


[0.0059 | 0.0140 [0.0080 | 0.0057_|0.0085_| 0.0070 | 0.0085_|_0.0070_| 0.0080 | 0.0059 | 
[-0.0036-| 0.0075 0.0039 | 0.0027_| 0.0039 | 0.0039 | 0.0039 | 0.0039 | 0.0039 | 0.0036 | 
[-0.0005_| 0.0008 [0.0008 — | 0.0005_| 0.0008 | 0.0008 | 0.0008 [0.0008 | 0.0008 [0.0005 | 
[3e-17_[ 0.0005 [0:0005_—_|3e-17_| 0.0005 _| 0.0005 | 0.0005 [0.0005 | 0.0005 [_3e-17_| 
[3e17_[3e17 [3e17 | 3e17 | 3e17_| 3e-17 | 3e17 | 3e17 | 3e-17 | 3el7 | 
[3et7 [ett [sett | et [set | set? | 3et7 | set? | 317 | 3ei7 | 
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Fig. 4. The samples for classification. 


to some uncertainty measure in evidence theory as introduced 
in the sequel: 

Rankm, = [Tm(1), .-;7m(N)). 
After applying the probability transformations, the rank of 
P,,..., Px in terms of uncertainty degree in probability theory? 
is: 

Rankp = [rp(1),...,.rp(N)]. 

It is very hard to completely maintain the order after 
the transformation. However, the degree of accordance or 
consistency between Rank,, and Rankp, i.e., the degree of 
order-preserving, can be used to evaluate different probability 
transformations. Such a degree of order-preservation can be 
defined using the distance between ranks as introduced before. 
Less degree of order-preserving represents more twist or loss 
of information in the procedure of probability transformations. 

When calculating the degree of order-preservation, we need 
to specify an uncertainty measure of belief functions. Some 
uncertainty measures in evidence theory are as follows. 

1) Non-specificity: Non-specificity [35] is defined as: 


S © m(A) logs | Al. 


ACO 


N(m) = (43) 


2) Confusion: Confusion [36] is defined using the BBA m 
and the Bel in the spirit of entropy as: 


S © m(A)log,(Bel(A)). 


Aco 


Conf (m) = — (44) 


3) Dissonance: Dissonance [37] is defined using the BBA 
m and the Pl in the spirit of entropy as: 


— S> m(A)log, (PU(A). 


Aco 


Diss(m) = (45) 


3Here we only consider Shannon’s entropy as the ranking criterion for 
convenience. 
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4) Aggregate Uncertainty measure (AU): Let Bel be a 
belief measure on the FOD ©. The AU [38] associated with 
Bel is measured by: 


AU(Bel) = max[— S~ po logy pol, (46) 


Bel 960 
where the maximum is taken over all probability distributions 
that are consistent with the given belief function. Pg; consists 
of all probability distributions (p¢|9 € ©) satisfying: 


po € [0,1], V0 € 0, 


Veco Pe = 1 _ 
Bel(A) < ogc4 Po < 1— Bel(A), VAC. 


(47) 


AU is an aggregated total uncertainty (ATU) measure. 

AU satisfies all the requirements of uncertainty measure [39] 
including probability consistency, set consistency, value range, 
sub-additivity and additivity for the joint BPA in Cartesian 
space. AU also has the drawbacks [3] of high computing 
complexity, high insensitivity to the changes of evidence, etc. 

5) Ambiguity Measure (AM): Let m be a BBA defined over 
the FOD 0 = {61,02,...,0,}. AM (ambiguity measure) [40] 
is defined as: 


— © BetP,, (0) log,(BetPn(9)), (48) 
XS) 
where BetPn(?) = digep sco ™(B)/|B| is the pignistic 


probability. Jousselme et al [40] declared that the AM satisfies 
the requirements of uncertainty measure and at the same time 
it overcomes the defects of AU, but in fact AM does not satisfy 
the sub-additivity [41]. Moreover in [39], AM has been proved 
to be logically non-monotonic under some conditions. 

6) Contradiction Measure (CM): The contradiction mea- 
sure [42] is defined as: 


m= 


XEX 


)- da( (m ,mx), (49) 
where 1 denotes the set of all focal elements of m and d, is 
Jousselme’s distance. There exists CM € (0, 1]. 


B. Simulation of degree of order-preservation 

Our simulation consists of the following steps: 

e Step 1: Randomly generate 10 BBAs and calculate the 
degree of uncertainty in each BBA to generate Rank; 

e Step 2: Apply the transformation using N types of PTs. 
We can obtain various Rank, (=A, ag 

e Step 3: Calculate the distance (dp (i )) betwee Rank», 
and each Ranky,, tS Ayia ING 

e Step 4: Repeat Step | - 3 a hundred times. Calculate 
the average distance and the corresponding standard 
deviation as follows: 


dpm(i) = 


(50) 


(51) 
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Fig. 5. Comparison of average rejection rate over all thresholds among different probability transformations. 
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Fig. 6. Rejection-error curves comparison of different probability transfor- 
mations. 


The PTs with smaller d,,,(z) and d,_s:a(2) are preferred. 


TABLE IX 
Algorithm 1. RANDOM GENERATION OF BBA. 


Input: ©: Frame of discernment; 

Nmazx: Maximum number of focal elements 
Output: Output: m: BBA 

Generate P(©), which is the power set of O; 
Generate a random permutation of P(O) + R(O); 
Generate an integer between 1 and Nmaz — 1; 
FOReach First k elements of R(©) do 


Generate a value within [0,1] > m;,i=1,...,0; 
END 

Normalize the vector m = [m1,...,m,] > m’; 
m(A;) =m; 


In Step 1, BBAs are generated using Algorithm | [17] in 
Table IX. All six types of uncertainty measures introduced 
above are used in the simulations. The distance between 
Rankm and Rank, used is the Kendall distance in (36). The 
evaluation results are shown in Tables X, XI and Fig. 7. 


From Tables X, XI, and Fig. 7, Unmin provides the worst 
(with the lowest degree of order preservation). Although all the 
degree of order change based on different uncertainty measures 
are all listed in Tables X, XI, and Fig. 7, we prefer to use the 
ones based on AU. The reasons are as follows. First, in all 
the uncertainty measures used here, only AU is a strict total 
uncertainty measure, which can describe both the discord and 
non-specificity in a body of evidence. Second, as we can see 
from the subfigure using AM, BetP’s degree of order change 
is zero. However, this does not make sense, because AM is 
defined based on BetP. Therefore, it is partial (or non-neutral) 
when BetP is also included for evaluation. AU is relative more 
appropriate to be used here. According to the subfigure based 
on AU, Un,,;7, is the worst. HDSmP and DSmP are also not 
that good in terms of order preservation. Other probability 
transformations perform similar to each other in terms of order 
preservation. 

Even AU is still not absolutely impartial (or neutral), 
because AU is designed also based on some probability 
transformation (maximization of entropy). If such an en- 
tropy maximization-based probability transformation is also 
included for evaluation, it will be partial (or non-neutral). So 
some new uncertainty measure for BBA not related to prob- 
ability transformation is needed for an impartial evaluation. 
We think that the contradiction measure in (49) is a good 
attempt. It is an uncertainty measure which is not based on a 
probability transformation, although its strictness (satisfying 
the requirements of the uncertainty measure) still deserves 
further research. 


X. CONCLUSIONS 


In this paper, we focus on the evaluations of probability 
transformations of a belief function. The existing transforma- 
tions are briefly reviewed and compared. Our experimental 
results and analysis show that PIC criterion alone is insufficient 
to truly measure the quality of a probability transformation. 
A compromise between fidelity and clarity is achieved by 
the joint use of PIC and the distance of evidence. We have 
also proposed an application-oriented evaluation approach for 
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Fig. 7. Evaluation in terms of the degree of order change. 


TABLE X 
EVALUATION OF DEGREE OF ORDER CHANGE (AVERAGE VALUE). 


Unmin 


amor [owe | 0 | ooo [00m [ose | _o2srr_[ _ozmsa | 0.60 


TABLE XI 
EVALUATION OF DEGREE OF ORDER CHANGE (STANDARD DEVIATION). 


Unmin 
0.1271 
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probability transformations. Furthermore, we have evaluated 
probability transformations by their robustness to preserve the 
uncertainty order of the original BBAs. The simulation results 
show that our proposed evaluation approaches are able to make 
rational comparison of different probability transformations. 
Future work includes the development of general and direct 
measures of uncertainty of a BBA, which do not depend on 
the choice of probability transformations. This is important for 
the property of uncertainty order preservation. 

Note that the evaluations for the issues in evidence theory 
(like the probability transformations, the evidence combina- 
tion, the determination of BBA, etc) lack solid theoretical 
foundation so far. In the future, we will attempt to propose 
more rational and useful criteria for probability transforma- 
tions and try to establish more theoretically sound evaluation 
approaches for probability transformations, which are impor- 
tant and challenging problems. 
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Abstract—In this paper we propose a new general method for 
decision-making under uncertainty based on the belief interval 
distance. We show through several simple illustrative examples 
how this method works and its ability to provide reasonable 
results. 


Keywords: belief functions, decision-making, distance be- 
tween BBAs. 


I. INTRODUCTION 


Dempster-Shafer Theory (DST), also known as the Math- 
ematical Theory of Evidence or the Theory of Belief Func- 
tions (BF), was introduced by Shafer in 1976 [1] based on 
Dempster’s previous works [2]. This theory offers an elegant 
theoretical framework for modeling uncertainty, and provides 
a method for combining distinct bodies of evidence collected 
from different sources. In the past more than three decades, 
DST has been used in many applications, in fields including 
information fusion, pattern recognition, and decision making 
[3]. Although belief functions are very appealing for model- 
ing epistemic uncertainty, the two main important questions 
related to them remain still open: 


1) How to combine efficiently several independent belief 
functions? 
This open question is out of the scope of this paper and 
it has been widely disputed by many experts [4]-[14]. In 
this short paper, we focus on the second question below. 

2) How to take a final decision from a belief function? 
This second question is also very crucial in many 
problems involving epistemic uncertainty where the final 
step (after beliefs elicitation, and beliefs combination) is 
to make a decision. 


In the sequel, we assume that the reader is familiar with 
Demspter-Shafer Theory of belief functions [1] and its no- 
tations. Due to space restriction, we will not recall the def- 
initions of basic belief assignment m/(-), belief Bel(-) (also 
called credibility by some authors), and plausibility functions 
Pl(-) functions defined over a given finite discrete frame of 
discernment (FoD) ©. For any focal element X of the powerset 
of O, denoted by 2°, the interval BI(X) = [Bel(X), P1(X)] 


is called the belief interval of X. Its length Pl(X) — Bel(X) 
characterizes the uncertainty on X (also called ambiguity in 
[15]). This paper is organized as follows. In section 2, we 
recall the common decision-making techniques used so far to 
make a decision from belief functions. In section 3 we recall 
the new distance measure based on Belief interval, and we 
present a new general method for decision-making with belief 
functions. Finally, examples of this new approach are given in 
section 4, with concluding remarks in section 5. 


II. CLASSICAL DECISION-MAKING USING BF 


We assume a given FoD O = {6),...,6,} and a given BBA 
m/(-) defined on 2°. We want to make a decision from m(-). It 
consists in choosing a particular element of the FoD that solves 
the problem under consideration, which is represented by the 
set of potential solutions (choices) 0;, 7 = 1,...,n. How to 
do this in an effective manner is the fundamental question of 
decision-making under epistemic uncertainty. Many decision- 
making criteria have been proposed in the literature. Some 
advanced techniques developed in the 1990s [15]-[19] have 
not been widely used so far in the BF community, probably 
because of their complexity of implementation. In this section, 
we only present briefly the simplest ones frequently used. 


1) Decision based on maximum of credibility: 


This decision-making scheme is the so-called prudent 
(or pessimistic) scheme. It consists in choosing the ele- 
ment of the FoD © that has the maximum of credibility. 
In other terms, one will decide 6= O;« with! 


Oj» = argmax Bel(6;). (1) 


2) Decision based on maximum of plausibility: 


On the contrary, if we prefer to adopt a more optimistic 
decision-making (less prudent) attitude, one will choose 


'The notation with hat indicates the decision taken. Here 0 specifies that 
the decision taken is only a singleton of O. 
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the element of the FoD © that has the maximum of 
plausibility. In other terms, one will decide 6 = @;» with 


0; = arg max Pl(6;). (2) 
3 


wm 


Decision based on maximum of probability: 


Usually decision-makers prefer to adopt a more balanced 
decisional attitude making a compromise between the 
aforementioned pessimistic and optimistic attitudes. For 
this, the BBA m/(-) is transformed into a subjective 
probability measure P(-) compatible with the belief 
interval [Bel(-), Pl(-)], and one will choose the element 
of the FoD © that has the maximum of probability. In 
other terms, one will decide 6= 6,» with 


Oj = arg max P(6;). (3) 


In practice, many probabilistic transformations are avail- 
able to approximate (or transform) a BBA m(-) in a 
probability measure P(-). By example, the pignistic 
transformation [20], the plausibility transformation [21], 
the DSmP transformation and other ones presented in 
[22], etc. 


Of course, in case of multiple maximum values, no decision 
can be clearly drawn. Usually if only one decision must be 
made, a random sample between elements 6; generating the 
maximal decision-making criterion value is used to make a 
unique final decision 6. Another more prudent decision scheme 
is to use the disjunction of all elements generating the maximal 
decision-making criterion value, to provide a less specific final 
decision (if it is allowed for the problem under concern). 


Our main criticism about using these decision-making 
schemes is that they do not use the whole information con- 
tained in the original BBA, which is in fact expressed by the 
whole belief interval. The pessimistic attitude uses only the 
credibility values, whereas the optimistic attitude uses only the 
plausibility values. The prudent attitude based on the criteria 
(3) requires a particular choice of probabilistic transformation 
which is often disputed by users. Making a decision from the 
P(.) measure is theoretically not satisfactory at all because 
the transformation is lossy since we cannot retrieve m/(-) from 
P(-) when some focal elements of m(-) are not singletons. In 
the next section, we propose a better justified decision scheme 
based on the belief interval distance [23], [24]. 


III. DECISION-MAKING USING BELIEF INTERVAL 
DISTANCE 
In our previous works [23], [24], we have defined a Eu- 
clidean belief interval distance between two BBAs mj(-) 


and m2(-) defined on the powerset of a given FoD O = 
{61,...,4n} as follows 


dgr(m,m2)*_[Ne- S> diy(Bh(X),BhL(X)), 4) 
XE2° 


Noe ijee" is a 
dgr(my1, mz) € [0,1], 


normalization factor to 
and dyw(Blh(X), Blo(X)) 


where 
have 


is the Wassertein’s distance [25] between belief 
intervals  BI(X) = [Bely(X), Pli(X)] =[a1,bi] and 
BIp(X) = [Belo(X), Plo(X)] = [az, ba]. More specifically, 


2 
dw ([a1, 61], [a2, b2]) = |" +b se 
2 2 
1 by — ay bo — ag a 
+3 2 Boe) | - 65) 


In [23], we have proved that dg;(x,y) is a true distance 
metric because it satisfies the properties of non-negativity 
(d(z,y) >0),  non-degeneracy (d(z,y)=0@a2=~y), 
symmetry (d(z,y) =d(y,x)), and the triangle inequality 
(d(x, y) + d(y, z) > d(x, z), for any BBAs z, y and z defined 
on 2°. The choice of Wasserstein’s distance in dg, definition 
is justified by the fact that Wasserstein’s distance is a true 
distance metric and it fits well with our needs because we 
have to compute a distance between [Bel,(X), Pl,(X)] and 
[Belz(X), Pla(X)]. 


For notation convenience, we denote mx the categorical 
BBA having only X as focal element, where X #4 Q@ is 
an element of the powerset of ©. More precisely, mx is 
the particular (categorical) BBA defined by mx(X) = 1 
and mx(Y) = 0 for any Y # X. Such basic BBA plays 
an important role in our new decision scheme because its 
corresponding belief interval reduces to the degenerate interval 
(1, 1] which represents the certainty on X. The basic principle 
of the new decision scheme we propose is very simple and 
intuitively makes sense. It consists in selecting as the final 
decision (denoted by X) the element of the powerset for which 
the belief interval distance between the BBA m(-) and mx, 
X € 2°\ {)} is the smallest one!. Therefore, take as the final 
decision X given by 
min 


X = arg 
X€E2°\ {0} 


dgi(m,mx), (6) 
where dgr(m, mx) is computed according to (4). m/(-) is the 
BBA under test and mx(.) the categorical BBA focused on 
X defined above. 


This decision scheme is very general in the sense that the 
decision making can be done on any type of element” of 
the power-set 2°, and not necessarily only on the elements 
(singletons) of the FoD (see examples in the next section). 
This method not only provides the final decision X to make, 
but also it evaluates how good this decision is with respect 


to its alternatives if we define the quality indicator g(X) as 
follows 


7 dgi(m, me) 
ixe2\ {0} dar(m, mx) 


I> 


g(X) 21 (7) 
One sees that the quality indicator g(X) of the decision X 
made will become maximum (equal to one) when the distance 


'This simple principle has also been proposed by Essaid et al. [26] using 
Jousselme’s distance. 
empty set excluded. 
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between the BBA m(-) and m< is zero, which means that the 
BBA m(-) is focused in fact only on the element X. The higher 
q(X) is, the more confident in the decision X we should be. 


Of course, if a decision must be made with some 
extra constraint? defined by a (or several) condition(s), 
denoted c(X), then we must take into account c(X) in Eq. 
(6), that is X= arg Minx €2©\ {9} s.t. c(X) dgi(m,mx), 
and also in the derivation of quality indicator by taking 
Dx E2°\ {0} st. e(X) dg1(m,mx) as denominator in (7). 


Theoretically any other strict distance metric, for instance 
Jousselme’s distance [27]-[29], could be used instead of 
dpr(-,-). We have chosen dg; distance because of its ability 
to provide good and reasonable behavior [23] as will be 
shown. When there exists a tie between multiple decisions 
{X;,j > 1}, then the prudent decision corresponding to 
their disjunction X= U; x ; Should be preferred (if allowed), 
otherwise the final decision X is made by a random selection 


of elements X;. 


IV. EXAMPLES AND COMPARISONS 


In this section we present several examples when the 
cardinality of the FoD |O| is only 2 and 3 because it is easier 
to see whether the decision-making results make sense or not. 
We compare and discuss decisions only made with the belief 
interval distance dg; and Jousselme’s distance dz because the 
other lossy decision schemes do not exploit both credibility 
and plausibility values. The examples corresponding to cases 
where the BBA m/(-) is focused on a single element X of 
2° are not presented because one trivially gets X=X using 
either dg; or dj, distances. 


The next tables present several BBAs from which a de- 
cision has to be made. By convention, and since we work 
with normal BBAs satisfying m(@) = 0, the empty set 
is not included in the tables. The rows for d3?(m;,mx) 
and for d"(m;,mx) list the minimal values obtained for 
dgr(m;,mx) and dj(m;,mx). The rows for X48: and for 
XJ list the decision(s) X made when using dgr(m;,mx) 
and dj(m;,mx) respectively. The rows for qg(X%?) and 
q(X 47) list the quality indicators of decision(s) made using 
dgr(m;,mx) and djz(m;,mx) respectively. Depending on 
the BBA, it is possible to have multiple decisions {Xj} in 
case of a tie. If a tie occurs either a random sampling of { xX yf 
must be drawn, or (if allowed) the disjunction of decisions x j 
is preferred. In the next subsections, we present results in free- 
constraint case (i.e. c(X) = 0), as well as when the decisions 
are restricted to be singletons (i.e. c(X) = “|X| = 1”). 


A. Examples with 0 = {A, B} 


Table I shows the decisions made when there is no constraint 
on the cardinality of the decision x. 

One sees that methods based on min of dg;(m,mx) and 
on min of dj(m,mx) yield the same reasonable decisions 


3for instance, making a choice only among the singletons of 2°. 


in almost all cases. With mz, one has multiple decisions 
X% = {A,B, AUB} with quality 0.6667 when using d,, 
which is a bit surprising in our opinion because there is a real 
tie between A and B. Consequently, the decision AUB should 
be preferred when there is no constraint on the cardinality 
of decisions. For this m2 case, one gets a unique decision 
X4s1 = AU B with a better quality 0.776 which seems more 
reasonable. We see also that all minimal distance values 
obtained with dp; are less (or equal in case ™ 1) to the minimal 
values obtained with d,. In fact, when the mass function is 
distributed symmetrically, it is naturally expected that no real 
decision can be easily taken (as illustrated for BBA’s m2(-) and 
ms(-) in Table I). Here, the decision AU B for BBA’s m2(-) 
and ms5(-) can be interpreted as a no proper decision, in the 
sense that A U B is the whole universe of discourse, hence we 
are merely selecting anything (and discarding nothing). Such 
kind of no proper decision may however be very helpful in 
some fusion systems because it warns that input information 
is not rich enough, and that one needs more information to 
take a proper decision (by including more sensors or more 
experts reports in the system for instance). For symmetrical 
mass function, the decision drawn from the new proposed 
decision rule is consistent with what we can reasonably get 
because. To make a proper decision we will always need to 
introduce some possibly arbitrary additional constraints. 


Table II shows the decisions made for same examples when 
we force the decision to be a singleton, that is when the 
constraint is c(X) = “|X| = 1”. One sees that the decisions 
restricted to the set of singletons using dg;(m,mx) or 
djz(m,mx) are the same but the quality indicators are a bit 
better when using dg;(m,mx) with respect to dj(m,mx). 
The values of the quality indicators in Table II are different to 
those of Table I which is normal because we use the constraint 
c(X) in the denominator of the formula (7). 


B. Examples with © = {A, B,C} 


Table III shows the decisions made when there is no 
constraint on the cardinality of the decision 4 , whereas Table 
IV shows the results for the same examples when the decisions 
made are restricted to singletons. As shown in the tables 
all minimal distance values obtained with dg, are less (or 
equal) to the minimal values obtained with d; and the quality 
indicator decisions is better when computed with dg; (except 
in case m, of Table HI). The decisions results obtained with 
dj are mostly consistent with those obtained with dg; (except 
in case m2 and m3 of Table II) where a larger set of decisions 
(tie) is obtained using dj . 


If the decisions are restricted to singletons (see Table IV), 
then the decision-making based on dg; and on d, provides 
the same results with a better quality of decisions using dgr. 
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Table I 
EXAMPLES OF SEVERAL BBA’S AND DECISIONS MADE (NO CONSTRAINT CASE). 


A 0.9 0.5 0.8 0.1 0.4 0.9 0.1 
B 0.1 0.5 0.1 0.1 0.4 0 0 
AUB 0 0 0.1 0.8 0.2 0.1 0.9 


ager (mi,mx) | 0.1000 0.0577 | 0.2309 | 0.0577 
q(X4B1) 0.9330 0.9502 | 0.8134 | 0.9622 
X¢BI A AUB | AUB A 


d?™(mi,mx) | 0.1000 0.5000 0.1000 | 0.4000 | 0.0707 | 0.0707 
q(x7J) 0.9390 0.6667 0.9276 | 0.6409 | 0.9574 | 0.9501 
xa A A,B,AUB AUB | AUB A AUB 


Table II 
EXAMPLES OF SEVERAL BBA’S AND DECISIONS MADE (RESTRICTED TO SINGLETONS). 


ma() [ ms) 


A 0.9 0.5 0.8 0.1 0.4 0.9 0.1 
B 0.1 0.5 0.1 0.1 0.4 0 0 
AUB » 0 0.1 0.8 0.2 A 0.9 
d2™(mi,mx) | 0.1000 | 0.5000 | 0.1528 | 0.5508 | 0.5033 | 0.0577 | 0.5196 
q(X7B1) 0.9000 | 0.5000 | 0.8477 | 0.5000 | 0.5000 | 0.9427 | 0.5393 
X¢BI A A,B A,B A,B A 
d™(m;,mx) | 0.1000 | 0.5000 0.6403 | 0.5099 | 0.0707 | 0.6364 


q(x4J) 0.9000 | 0.5000 0.5000 | 0.5000 | 0.9308 | 0.5276 
XoF A A,B A,B A,B A A 


Table II 
EXAMPLES OF SEVERAL BBA’S AND DECISIONS MADE (NO CONSTRAINT CASE). 


ae™(mi,mx) | 0.1000 | 0.2887 0.4082 0.2887 
q(X7BL) 0.9776 | 0.9242 0.8787 0.9120 
AUB, 

BUC,® 
d™™(mi,mx) | 0.1000 | 0.5000 0.5774 0.4082 
q(X4J) 0.9798 | 0.8870 0.8571 0.8989 
A,B, AUB, 


en A 2° 
a AUB \ {0} BUC,O 


xX¢BI A AUB | 2°\ {0, A, B,C} 


Table IV 
EXAMPLES OF SEVERAL BBA’S AND DECISIONS MADE (RESTRICTED TO SINGLETONS). 


A B 0.5 0 0 0 0.2 


d2™(mj;,mx) | 0.1000 | 0.5000 | 0.5774 | 0.2887 | 0.5000 | 0.5092 | 0.6236 | 0.5770 
q(X7BI) 0.9488 | 0.7321 0.6667 | 0.8531 | 0.7388 | 0.7364 | 0.6667 | 0.6855 
X¢BI A A,B | A,B,C A B B A,B,C A 
d™(mj;,mx) | 0.1000 | 0.5000 | 0.5774 | 0.3536 | 0.5774 | 0.5932 | 0.6667 | 0.6117 
q(X47) 0.9488 | 0.7321 0.6667 | 0.8300 | 0.7257 | 0.7229 | 0.6667 | 0.6836 
xX¢s A A,B | A,B,C A B B A,B,C A 
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V. CONCLUSIONS 


We have presented a new method for decision-making with 
belief functions which truly exploits the belief interval value 
of each focal element of a BBA. It is easy to implement and 
can be applied with any strict distance metric between two 
BBAs. We have considered and compared the well-known 
Jousselme’s distance and the recent belief interval distance. 
This method is general because the decision can be made not 
only on singletons, but also on any other compound focal 
elements (if needed and allowed). It also provides a quality 
indicator of the decision made. 
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Abstract—In this paper we show how the Belief-Function 
based Technique for Order Preference by Similarity to Ideal 
Solution (BF-TOPSIS) approach can be used for solving non- 
classical multi-criteria decision-making (MCDM) problems. We 
give simple examples to illustrate our presentation. 


Keywords: multi-criteria decision-making, belief functions, 
TOPSIS, BF-TOPSIS. 


I. INTRODUCTION 


Classical Multi-Criteria Decision-Making (MCDM) consists 
in choosing an alternative among a known set of alternatives 
based on their quantitative evaluations (numerical scores) 
obtained with respect to different criteria. A typical example 
could be the selection of a car to buy among a given set of 
cars based on different criteria (cost, engine robustness, fuel 
economy, COz emission, etc). The classical MCDM problem, 
although easily formulated, have no solution at all in general 
due to the fact that no alternative exists that optimizes all cri- 
teria jointly. Thus MCDM problems are generally not solved, 
but a decision is found by means of ranking, compromises 
etc. The difficulty of MCDM problems is also because the 
scores are usually expressed in different (physical) units with 
different scales which generally necessitates an ad-hoc choice 
of a normalization step that may lead to rank reversal. 

Many MCDM methods have been developed, like AHP! 
[1] and its extensions [2]-[6], ELECTRE” [7], TOPSIS? [8], 
[9] which are widely used in applications. They have already 
been extended in the belief function framework in our previous 
works [2], [10], [11] to take into account epistemic uncertainty, 
missing scores’ values as well as conflicting information 
between sources’. In this work, we show how the BF-TOPSIS 
methods proposed recently in [11] (with application in [12]), 
can be directly used for solving also non-classical multicriteria 
decision-making problems where not only alternatives are 
scored (with possibly missing values), but also any element 
of the power set of alternatives. 


‘Analytic hierarchy process. 

Elimination and choice translating reality. 

3Technique for order preference by similarity to ideal solution. 

4Tn the MCDM context, a source of information consists in the list of scores 
values of alternatives related to a given criterion. 


In the sequel, we assume the reader to be familiar with the 
theory of belief functions [13] and its definitions and notations, 
mainly the basic belief assignment (BBA) m/(-), the belief 
function Bel(-) and the plausibility function Pl(-) defined with 
respect to a discrete finite frame of discernment (FoD). 


II. NON-CLASSICAL MCDM PROBLEM FORMULATION 


We consider a given set of alternatives denoted by 
A = {A1, Ao,..., An} (M > 2) representing the FoD of our 
problem under consideration, and we denote by 24 the power 
set? of A. In our approach, we work with Shafer’s classical 
model of FoD and we do not allow the empty set to be a 
focal element® because in our opinion it does not make sense 
to compare an alternative with respect to the empty set from 
the decision-making standpoint. The cardinality of the (non 
empty) elements of the power set varies from 1 to 2” — 1. 


We also consider a given set of criteriaC 4 {C1,C2,...,C'n} 
(N > 1), where each criterion C;; is characterized by a relative 
importance weighting factor w; € [0,1], 7 =1,...,.N such 


that eal w ; = 1. The set of normalized weighting factors 
is denoted by w = {w1,w2,...,wy}. The score’ value is 
a number $;; = 5;(X;) related to the evaluation of an 
element X; € 24 \ {0} from a given criterion C;. If the 
score value $;(X;) is not available (or missing), we denote 
it by the “varnothing” symbol @. The non-classical MCDM 
problem can be formulated as follows in the worst case 
(i.e. when scores apply to all elements of 24): given the 
(2” —1)x N score matrix S = [S;(X;)] whose elements 
take either a numerical value or a @ value (if the value is not 
available) and knowing the set w of the relative importance 
weights of criteria, how to rank the elements of 27 \ {0} to 
make the final decision? 


5The power set 24 is the set of all subsets of A, empty set @ and A 
included. 

Sas proposed in Smets Transferable Belief Model for instance. 

7Depending on the context, the score can be interpreted either as a 
cost/expense or as a reward/benefit. In the sequel, by convention and without 
loss of generality, we will interpret the score as a reward having monotonically 
increasing preference. Thus, the best alternative with respect to a given 
criterion will be the one providing the highest reward/benefit. 
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Example: Let us consider the ranking of five students Aj, 
Ay, As, Ag, and As based on two criteria C; and C2. The 
criterion C; is their long jump performance (in meters), and 
the criteria C2 is a realization of a small project to collect funds 
(in euros) to help a bigger nature protection project. Highest 
scores values mean better results in this particular context. Let 
us assume that students were allowed to realize their project 
in joint collaboration (no more than three students are allowed 
in a group), or alone. At the end term of the project, suppose 
that one has the two following evaluations (scoring) 


A - oe 
ve ae, As 640€ 
Sc, = a 3.8 on and Sc, = Aj U Apo 600 € 
A . 
pape As U Aq | 650€ 


(1) 
The scores’ values listed in Sc, indicate in fact that the 
student Az has not been able to pass the long jump test for 
some reason (medical, familial or whatever), so his score is 
missing. The scores’ values listed in Sc, indicate that As 
did choose to realize his project alone with a pretty good 
performance, and the project realized by the collaboration 
of students A3 with A, has obtained the best performance 
(the highest amount of collected funds). In this very simple 
example, one sees that the score evaluation can be done not 
only on single alternatives (as for criterion C',) but also on a 
subset of elements of 2 (as for criterion C2). All the elements 
having a score are called scoring focal elements. In general, 
these focal elements can differ from one criterion C’; to another 
criterion Cy, for k # 7 and the score matrix cannot be built 
by a simple (horizontal) stacking of scoring lists. In general, 
one must identify all focal elements of each scoring list to 
determine the minimum number of rows necessary to define 
the scoring matrix. As mentioned, we use the symbol @ to 
identify all values that are missing in the scoring matrix. Note 
that we do not set missing values to zero number (or any 
other chosen number) to make explicit distinction between the 
known precise numerical value zero and a missing value. In 
this example, the scoring matrix will be defined as 


Cl Co 
Ay 3.7m 2) 
A3 3.6m 2) 
A4 3.8m 2) 
a= 4 3.7m 640€ (2) 
A, U Ag @ 600 € 
A3gU Ag @ 650€ 


The question we want to address is how to rank the 
students based on such a kind of scoring information including 
disjunctions of alternatives and missing values, taking into 
account the relative importance weight of each criterion. Is it 
possible to solve such type of non-classical MCDM problems, 
and how? 


II. THE BF-TOPSIS APPROACH 


The BF-TOPSIS approach has been proposed recently in 
[11] in a classical MCDM context where the focal elements 
of the scoring function 5;(-) (j = 1,...,N) are only the 
singletons A; (i = 1,...,M@) of the frame of discernment 
A. BF-TOPSIS is initially based on belief functions for 
MCDM support which exploits only the M x N score matrix 
S = [5;(A;)] and the relative importance weighting factors of 
criteria. The first main step of BF-TOPSIS is the construction 
of an M x N BBA matrix M = [m,,;(-)] from the score matrix 
S, and then the combination of components of M to make a 
final decision thanks to the Euclidean belief interval distance, 
denoted by dgr, defined in [14], [15]. 

In fact, the BF-TOPSIS approach can also be directly 
applied to solve the non-classical MCDM problems because 
the belief interval [Bel;;(X;), Pli;(Xi)] of each proposition 
(i.e. each focal element which is not necessarily a singleton) 
X; based on a criteria C’; can be established in a consistent 
manner® from the score matrix S = [S;(X;)] as follows 

Xiiax J 


where the Sup;(X;) and Inf,;(X;) are computed from the 
score matrix S by 


YE24|S;(Y)<Sj(Xi) 


[Bels;(X;4); Plij(Xz)] = [ 


ISi(Xi) — SY) A 


Infj(Xi) = — |S5 (Xi) — S5(Y)|G) 
YE24|S;(Y)>Sj (Xi) 


Sup;(X;) is called the “positive support” of X; because 
it measures how much X; is better than other propositions 
according to criterion C;, and Inf; (X;) is called the “negative 
support” of X; because it measures how much X; is worse 
than other propositions according to criterion C’;. The length 
of interval [0, Sup,(X;)] measures the support in favor of X; 
as being the best proposition with respect to all other ones, 
and the length of [Inf;(X;),0] measures the support against 
X; based on the criterion C;. 

The denominators involved in (3), are defined by X. 
max; Sup;(X;) and xX? 


j A 
J, = minjInf;(X;), and they 
are supposed different from zero’. From the belief interval 
[Bel;;(X;); Plij(Xi)], we obtain the BBA m,,(-) defined by 


mij (Xi) = Belig(Xi) (6) 
Mi; (Xi U Xj) & Pl; (Xi) = Bel;;(X;) (8) 


If a numerical value S;(X;) is missing in the score matrix S 
(it is equal to @), one chooses m,;(-) equals (0,0, 1), i-e., one 
takes a vacuous belief assignment. In [11], we have proposed 
four methods (called BF-TOPSIS1, ..., BFTOPSIS4) to make 
a decision from the BBA matrix M = [m,,;(-)]. Due to space 


8Indeed, Bel;;(X;) and Bel;; (Xi) (where X; is the complement of X; 
in the FoD A) belong to [0,1]. They are consistent because the equality 
Plij(Xi) = 1 — Bel; (X;) holds. The proof is similar to the one in [11]. 
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restriction constraint, we just recall the principle of the BF- 
TOPSIS1 method because it is the simplest one. Applications 
of BFTOPSIS2—BFTOPSIS4 methods to non-classical MCDM 
problems is also possible without difficulty. The proposed 
transformation of score values to BBAs and basis of BF- 
TOPSIS method are theoretically justified in [11]. 


Before presenting succinctly the BF-TOPSIS1 method, we 
need to recall the definition of Belief Interval-based Euclidean 
distances dgy(mz1, mz) introduced in [14] between two BBAs 
my,(-) and ma(-) defined on a same FoD 0. Mathematically, 
dg1(m1, mz) is a true distance defined by [14] 


dpi(mi,m2) = /Ne- S> dy(Bh(X),BL(X)) ©) 
XE29° 


where N. = 1/2!°!-! is a normalization factor to 
have dgr(mi,m2) € [0,1], and dw(Bh(X), Blo(X)) 
is the Wassertein distance [16] between belief intervals 
BI,(X) 4 [Bel,(X), PL (X)] = [a1, b1] and BIn(X) 4 
[Belz(X), Pla(X)] = [a2, b2]. More specifically, 


ath az+be 
2 2 


a bj — ay, by — ag ae 
3 2 2 


dw ([a1, bi], [a2, b2]) = (| 


(10) 


Principle of BF-TOPSIS1: From the BBA matrix M and for 
each proposition (focal element) X;, one computes the Belief 


Interval-based Euclidean distances dg7(mij;, miss") defined in 
(9) between the BBA m,,;(-) and the ideal best BBA defined 
by mbsst(X;) = 1, and the distance dg y(mjij, mf") between 
mj;(-) and the ideal worst BBA defined by m?0™'(X;) = 1. 
Then, one computes the weighted average of 
dgr(mi;, mess) values with relative importance weighting 
factor w,; of criteria C;. Similarly, one computes the weighted 


average of dgy(mij,mjP") values. More specifically, one 


computes 
N 
d"(x;) 4 > w,; - dpr(miy, miss) (11) 
j=l 
N 
gn(X,) A ‘ w;- dgr(mi;, myer) (12) 
j=l 


The relative closeness of the proposition X; with respect to 

ideal best solution X>**' defined by 

dS" (X;) 
is used to make the preference ordering according to the 
descending order of C(X;, X**) © [0,1], where a larger 
C(X;, X°*') value means a better proposition X;. 

Note that once the BBA matrix is computed from Eqs. 
(6)-(8), we can also apply (if we prefer) BF-TOPSIS2, BF- 
TOPSIS3 or BFTOPSIS4 methods to make the final decision. 
Their presentation is out of the scope of this paper. 


C(X;, X*) & (13) 


IV. BF-TOPSIS APPLIED TO A NON-CLASSICAL MCDM 


We present the results of the BF-TOPSIS1 method for two 
simple non-classical MCDM problems. 


Example 1: This example is given by the score matrix of Eq. 
(2). We consider the relative importance weights w; = 1/3 
and w2 = 2/3 of criteria C, and Cy respectively. Applying 
BBA construction formulas (6)—(8) for this example’, we get 
the BBA matrix M = [(1745 (Xi), Miz (Xi), Mig (X; U X;i))] 
with 


C1 C2 
Ay [ (0.25, 0.25, 0.50) (0,0, 1) 
Ag (0, 1,0) (0,0, 1) 
M= Ag (1, 0, 0) (0, 0, 1) (14) 
= he (0.25, 0.25, 0.50) (0.6667, 0.1111, 0.2222) 
A, U Ao (0,0, 1) (0, 1, 0) 
Ag UiAy (0,0, 1) (1,0, 0) 


From this matrix M, we compute the distances dg;(.,.) with 

respect to ideal best and worst solutions shown in Table I. 
Table II provides d°*"(X;), d¥™'(X;) and C(X;, Xe) values 
computed from the formulas (11)-(13). Based on C(X;, Xe) 
values sorted in descending order, we finally get the preference 
order (A3 U Ag) > As > Ag > Ay > (Ay U Ag) > As. 
If we restrict the preference order to only singletons, we 
will get A5 > Ay > A, > As (ie. student As is the 
best one). Note that student A» alone cannot be ranked with 
respect to the other students, which is normal based on the 
non-specific input (scoring) information one has for him. Of 
course ad-hoc ranking solutions to rank all five students can 
always be developed!®, but without necessarily preserving the 
compatibility with the rank obtained previously. 


Example 2: In mountains, protecting housing areas against 
torrential floods is based on a lot of alternatives at the water- 
shed scale such as check dams’ series, sediment traps, dikes, 
and individual protections [12]. Moreover, alternatives can be 
the maintenance of existing structures or the construction of 
new ones to increase the protection level. Final propositions 
generally involve several of previous individual alternatives. 
We propose here a simplified case of application. Within a 
given watershed, a check-dams’ series already exists. Older 
than one century years old, its maintenance (alternative Aj) is 
questioned. Some experts propose to abandon it and to build 
a sediment trap upstream the alluvial fan (alternative A) or 
to limit damage on buildings through individual protections 
(alternative A3). The Decision-Maker (DM), here the local 
municipality, must decide the best proposition taking into ac- 
count several criteria: the investment cost (C; in €, in negative 
values), the risk reduction in 50 years between the current 
situation and the expected situation after each proposition 
implementation (C2 in €), the impact on environment (C3 
is a grade from | to 10), and the land-use areas needed in 
privates (Cy in m?, in negative values). For each criterion, the 
higher is the score, the better is the proposition. The DM gives 


°When a score value is missing for some proposition X; (ie. if 
S;(X;) = ©), then we take the vacuous BBA mjj (Xj U Xi) =1. 

'0for instance by normalizing the C'(X;, X>*s) values (the most right 
column of Table II) and interpret it as a BBA, and then apply a decision 
method described in [15]. 
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Focal elem. X; | der(mi1,m’**") — der(mi1,m”?"**) | dpr(miz,m?**) — dgr(mi2,m 
Ai : : : j 


0.4415 
0.5561 
0.1573 
0.7597 


AVERAGE DISTANCES AND RELATIVE CLOSENESS INDICATORS. 


the same importance weight to C; and C2 (w, = w2 = 0.33), 
but they are more important than C3 (w3 = 0.20) which is 
more important than Cy (w4 = 0.14). The score matrix is 
given in Eq. (15). In this case, the problem is not to have no 
knowledge on some scores but is that they are not cumulative 
in the same way for each criterion. For C, and C4, the score 
of the disjunction of two alternatives is the sum of individual 
scores whereas it is not the case for C2 and C3. 


Ci C2 C3 C4 
Al —150000 100000 10 0 
Ag —500000 200000 2 —20000 
A3 —550000 250000 10 —5000 
S= A,UA2 —650000 230000 2 —20000 (15) 
A, UA3 —700000 250000 10 —5000 
A2UA3 —1050000 250000 2 —25000 
A, U Ag U Az | —1200000 250000 2 —25000 


The BBA matrix based on S using (3)-(8) (rounded to 2 
decimal points) is 


C1 C2 C3 Ca 

(1, 0, 0) (0, 1, 0) (1, 0, 0) (1, 0, 0) 
(0.44, 0.10, 0.46) (0.45, 0.28, 0.27) (0,1,0) (0.10, 0.67, 0.23) 
(0.37, 0.13, 0.50) (1, 0, 0) (1,0,0) (0.70, 0.07, 0.23) 
(0.27, 0.21, 0.52) (0.73,0.10,0.17) (0,1,0) (0.10, 0.67, 0.23) 
(0.23, 0.26, 0.51) (1, 0, 0) (1,0,0) (0.70, 0.07, 0.23) 
(0.04, 0.75, 0.21) (1, 0, 0) (0, 1, 0) (0, 1, 0) | 

(0, 1, 0) (1, 0, 0) (0, 1, 0) (0, 1, 0) 


The weighted distances to the ideal best and worst solutions 
and the relative closeness indicator are listed in Table IIL. 
Based on relative closeness indicator sorted in descending 
order, the final preference order is (Aj U As) > A3 > Ay > 
(A, U Ag) > (AgU As) > Ag > (Ai U Ag U A3): maintaining 
the existing check dams’ series and implementing individual 
protections is the best option. If the preferences are restricted 
to single alternatives, one will get as final preference order 
A3 > A, > Ag, i.e. option Ag (only individual protections) 
should be preferred by the DM. 


V. CONCLUSIONS 


In this paper, we have shown how the BF-TOPSIS approach 
can be exploited to solve non-classical MCDM problems. 
This method is relatively easy to use. It does not require the 
normalization of data and offers a consistent construction of 
basic belief assignments from the available scoring values. It 
can also deal with missing scoring values and different criteria 
weights as well. In this paper only the BF-TOPSIS1 method 
has been presented, but other more sophisticate BF-TOPSIS 
methods could be also used to solve non-classical problems, 
but at the price of a higher complexity. The application of 
this new BF-TOPSIS approach to solve non-classical MCDM 
problems for natural risk prevention is currently under evalu- 
ation, and it will be reported in a forthcoming publication. 
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Ai 0.6116 0.6700 3 
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. : 0.3938 
A, U Az U Ag ; : 0.2444 
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Abstract—In this short note, we present two classes of examples 
showing that Dempster’s rule of combination is insensitive to 
the conflict level between the sources of evidence. This behavior 
is intuitively not satisfying because the amount of dissonance 
between sources should have an impact in the fusion result 
when the basic belief assignments (BBA) to combine are truly 
informative (not vacuous). 


Keywords: Dempster’s rule, Information fusion, belief func- 
tions. 


I. INTRODUCTION 


In this short note, we discuss the behavior of Dempster’s 
rule of combination used in Dempster-Shafer Theory (DST) 
[1], [2] to combine basic belief assignments provided by 
distinct sources of evidences. After a brief introduction of 
belief functions in Section II and a recall of Dempster’s rule 
of combination in section III, we provide in section IV two 
classes of examples showing the counter-intuitive behavior of 
Dempster’s rule. These new classes of examples generalize 
examples presented in [3]. The conclusion is made in section 
Vi 


II. BELIEF FUNCTIONS IN SHORT 


Belief functions have been introduced by Shafer in [1] to 
model epistemic uncertainty. We assume that the answer! of 
the problem under concern belongs to a known (or given) finite 
discrete frame of discernement (FoD) O = {61,62,...,8n}, 
with n > 1, and where all elements of © are exclusive’. 
The set of all subsets of © (including empty set § and ©) is 
the power-set of © denoted by 2°. A basic belief assignment 
(BBA) associated with a given source of evidence is defined 
[1] as the mapping m/(-) : 2° — [0,1] satisfying m(@) = 0 and 
do ac2e M(A) = 1. The quantity m(A) is called the mass of 
A committed by the source of evidence. Belief and plausibility 
functions are respectively defined by 


Bel(A) = S~ m(B), and PI(A)=1-Bel(A). (1) 
BCA 
Be2° 


‘ie. the solution, or the decision to take. 
2This is so-called Shafer’s model of FoD [2]. 


If m(A) > 0, A is called a focal element of m/(-). The set of 
focal elements of a BBA m is denoted F(m). When all focal 
elements are singletons then m/(-) is called a Bayesian BBA 
[1] and its corresponding Bel(-) function is homogeneous to a 
(subjective) probability measure. The vacuous BBA, or VBBA 
for short, representing a totally ignorant source is defined as* 


m(0) = 1. 
Shafer [1] proposed to combine s > 2 distinct sources of 
evidence represented by BBAs ™m(.),..., ™75(.) over the same 


FoD with Dempster’s rule. The justification and behavior of 
Dempster’s rule have been disputed over the years from many 
counter-examples involving high or low conflicting sources 
(from both theoretical and practical standpoints) as reported in 
[4]-[7]. After a brief recall of Dempster’s rule of combination 
in section II, we present new interesting examples showing the 
counter-intuitive behavior of this rule in section IV. 


III. DEMPSTER’S RULE OF COMBINATION 


Dempster’s rule of combination can be seen as a normalized 
version of the conjunctive rule. So, let’s recall at first what is 
the conjunctive rule (CR) of combination. Mathematically, CR 
of s > 2 BBAs m,(-), i = 1,...,s defined with respect to 
same FoD @ is defined for any X € 2° by 


2 


Kay Xeee 
XiNXo2N...NXs=X 


[[m(x). @ 


i=l 


The conjunction (intersection) of two (or more) sources of 
evidence only keeps the items of information asserted by both 
(all) sources. This rule has been justified by Dempster [8] in 
statistical terms on the basis of the independence of the sources 
which provide m; with F(m,;), 1 = 1,...,s. The set of focal 
elements of m&" ,(-) is given by 


Filmes ,) = {X1 0... X|X; € Fm), 2 = 1,..., 8}. 


The term m{% ,() reflects the amount of dissonance 


between the sources [9] (also called the level or degree of 
conflict between the sources of evidence). Its management 


3The complete ignorance is denoted © in Shafer’s book [1]. 
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gives rise to many debates on the choice of possible rules 
to combine distinct and reliable sources of evidence. In DST, 
Shafer proposed Dempster’s rule* in which the positive value 
my? ,(0) (if any) committed to the empty set (impossible 
event) is removed through a simple normalization technique. 
Mathematically Dempster’s rule of combination of s > 2 basic 
belief assignments is defined by m3 _,(0) = 0, and for any 
X #0e2° 
mez s(X) 

Dempster’s rule is commutative and associative and preserves 
the neutrality of vacuous BBA in the fusion process, which 
makes Dempster’s rule an appealing method to fuse BBAs 
from implementation standpoint, even if the validity of its 
result has been highly disputed since its first criticism made 
by Zadeh in [10] over last decades in case of high conflicting 
situations, and more recently in [4]-[7] for the case of low 
conflicting situations. 


mix ..(X) = [m1 @...@ ms](X) 4 (3) 


In the next section we present two classes of examples 
where Dempster’s rule is insensitive to the conflict level. 


IV. NEW CLASSES OF EXAMPLES 
A. First class of examples 
Let’s consider a finite frame of discernment © and two 
BBAs mj4(-) and mo(-) with focal elements in 2° given by 
F(m1) => {Aj, Ao, acids An}, 
F(mz2) = {A, Bi, ee Bits 


where A; C A forl <i<n,and A;NB; =@ forl<i<n 
and 1<j<m. 

The mass of each focal element is denoted by its corre- 
sponding lowercase letter, that is m1(A;) = a; forl <i<n, 
and m2(A) = a and m2(B,;) = b; for 1 < 7 < m. Because 
m ,(-) and mo(-) are normalized BBAs, one has )"_, a; =1 
and a+ S07" bj =1. 

In applying the conjunctive rule of combination of m(-) 
with mo(-), we get 

moF(A;) =a-:a;, forl<i<n, 


and 


Obviously m@?(0) + 07_, mG"(A;) = 1, which means 
that the following equality holds 


mya"( (0) =1-Ye uat-adra=ta (4) 


because )>y"., a; = 1. 


4This rule has been introduced by Dempster in [8]. It has been denoted 
and popularized with the operator symbol © by Shafer in [1]. 


To get Dempster’s rule result, we need to normalize the 
BBA m&(-) by dividing the masses mip (Ai) by 1 — 
m°"(0), or equivalently just by dividing m5*(A;) by the 
value a because from (4) one always has 1 — m&%() = 
1—(l-a)=a. 

After the normalization by division of masses m{;°(A;) by 
a, one gets as Dempster-Shafer fusion result 


mip (Ai) = [m1 ® m2](Ai) = m1 (Ai) = aj. 5) 


Therefore, it is clear in such class of examples that the BBA 
mg(.) has absolutely no impact in Dempster-Shafer fusion 
result even if m(-) is truly informative (not vacuous) and 
conflicting with the BBA m(-). 

The conflict level m{3*(0) = OV, oi", aby can be as 
high (close to one) or as low (close to zero) as we want, 
Dempster’s rule provides in this class of examples always 
the same result m°(-) = m1(-), which is a counter-intuitive 
behavior not very Teobnimiended for fusion applications. 


B. Second class of examples 


This second class of example is a bit more general than the 
previous one. We consider a finite frame of discernment O 
and two BBAs m,(-) and mo(-) with focal elements in 2° 
given by 


F(m1) => {Aj, Aa, sens ,An, B}, 


F(mz2) = {B,C\,. inp Croks 


such that 


e A; C Bforl<i<n, 
e BNC; =9 for 1 <j <™m, and 
© A;NC; =9 forl<i<nand1<j<™m. 


The mass of each focal element is denoted by its cor- 
responding lowercase letter for all elements Aj, that is 
m,(A;) = a; for 1 <7 <n and by m(B) = by. Similarly 
m2(C;) = ¢; for 1 < 7 < mand m2(B) = be. Because mj (-) 
and m2(-) are normalized BBAs, one has b; + >", a; = 1 
and bo + yi Ci =1; 

In applying the conjunctive rule of combination of mj(-) 
with mo2(-), we get 


mR (A) = boa; 
mB ) = by be 


forl<i<n 


and 


n m 


ae 3 bic;. 
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Obviously m&"(0) + Ti_, mEP(Ai) + mEPR(B) = 1, 
which means that the following equality holds 
mF (b) =1- Yo mi =e) 


AS tyes athe 
i=1 
= 1 = be(bi + > a 
w=1 


=1- bz, 


because 6; + )0j_, ai = 1. 

To get Dempster’s rule result, we need to normalize the 
BBA mig ( -) by dividing the masses m&"(A;) and m&*(B) 
by 1— m{3*(0) = 1— (1 — ba) = be. 

After the normalization by division of masses m@"(A;) 
and m¢&"(B) by bo 4 0, one gets the Dempster-Shafer fusion 
result for? = 1,...,n 


mix (Ai) = [m1 @ m2] (Ai) = mi (Ai) = ai, (6) 


and 
mS (B) = [m1 @ m2](B) = m1(B) = by. (7) 


Therefore, it is clear in such second class of examples that 
the BBA mo2(.) has also absolutely no impact in Dempster- 
Shafer fusion result even if m2(-) is truly informative (not 
vacuous) and conflicting with the BBA mj(-). 


The conflict level m&*(0) = eget Daye GC 
aes ,01¢; = 1 — be can be as high (close to te) or as 
low (close to zero) as we want. Dempster’s rule provides 
in this second class of examples always the same result 
(-) = mj,(-), which is a counter-intuitive behavior not 
recommended for fusion applications. 


V. CONCLUSIONS 


We have given two classes of counter-examples to Demp- 
ster’s Rule, where this rule is insensitive to the fusion, in the 
sense that combining two different conflicting sources of in- 
formation characterized by the basic belief assignments m (-) 
and m/(-), the fusion result is equal to m1(-). Therefore m2(-) 
has no impact in the fusion, although ma(-) is different from 
the uninformative source characterized by the vacuous basic 
belief assignment m(Q) = 1. Numerical counter-examples to 
Dempster’s Rule can also be found in [11]. 
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Abstract—Dempster-Shafer evidence theory (DST) is a theo- 
retical framework for uncertainty modeling and reasoning. The 
determination of basic belief assignment (BBA) is crucial in 
DST, however, there is no general theoretical method for BBA 
determination. In this paper, a method of generating BBA using 
fuzzy numbers is proposed. First, the training data are modeled 
as fuzzy numbers. Then, the dissimilarities between each test 
sample and the training data are measured by the distance 
between fuzzy numbers. In the final, the BBAs are generated from 
the normalized dissimilarities. The effectiveness of this method 
is demonstrated by an application of classification problem. 
Experimental results show that the proposed method is robust 
to outliers. 


Keywords—Evidence theory, basic belief assignment (BBA), 
fuzzy numbers, outliers. 


I. INTRODUCTION 


The theory of belief functions also called Dempster-Shafer 
evidence theory (DST) [1], [2], is a theoretical framework 
for uncertainty modeling and reasoning. The expression of 
uncertainty, i.e., the determination of basic belief assignment 
(BBA) is one of the most crucial problems to deal with. BBA 
is a kind of random set in nature and its determination is 
actually the problem of modeling the distribution of random 
set, which is still unsolved in mathematics [3]. Therefore, the 
determination of BBA is a challenging problem in DST and 
has aroused widespread concerns. 

One category for generating BBA is the application-based 
empirical approach. Shafer [1] generates BBA based on statis- 
tic evidence. Selzer [12] generates BBA according to the class 
number and the neighborhood of the target for automatic target 
classification. Bi [13] proposed focal element triplet for text 
categorization. Valente [4] proposed several BBA determina- 
tion methods for speech recognition based on the membership. 
Zhang [5] generates BBA based on evidential Markov random 
field for image segmentation. Salzenstein [14] proposed an 
iterative estimation method to generate BBA based on the 
Gaussian model for multisensor image segmentation. Dezert 
[6] generates BBA to describe the uncertainty of threshold 
choosing in edge detection. Han [7] generates BBA based on 
the intervals of the expected payoffs for different alternatives 
to deal with multi-criteria decision making problems. 
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The another category for generating BBA is the application- 
free approach. Boudraa [8] proposed a method based on fuzzy 
membership functions. Deng [9] generates BBA based on the 
similarity measure described by the radius of gyration. Han 
[10] proposed a method based on uncertain optimization. Kang 
[11] proposed a method based on interval numbers. 

In Kang’s method [11], the training data are modeled as 
interval numbers determined by their lower and upper bound 
values. Since the interval number is a special case of the fuzzy 
number and only keeps minimum and maximum values, other 
important information, such as mean value and median, are 
lost when modeling the data. To deal with this, other types 
of fuzzy numbers are used to model the training data in this 
paper, i.e., the mean value and median are also kept to describe 
the training data. Then the BBAs are generated from the 
dissimilarities between the test sample and the training data 
using the distance between fuzzy numbers. Compared with 
the distance between interval numbers in Kang’s method, the 
distance between fuzzy numbers is more robust when there 
exist outliers in training data. To verify the effectiveness of 
the proposed BBA determination method, we consider its ap- 
plication on the classification problem. The experiment results 
show that the proposed method can achieve high classification 
accuracies. 


II. BASIS OF EVIDENCE THEORY 


Dempster-Shafer evidence theory (DST) [1], [2] is a theo- 
retical framework for uncertainty modeling and reasoning. In 
DST, the frame of discernment (FOD) © contains / mutually 
exclusive and exhaustive elements: 0 = {61,62,...,0}. The 
power set of © (the set of all subsets of ©) is denoted by 
2°. The basic belief assignment (BBA, also called a mass 
function) m is defined from 2° to [0,1] satisfying 


ye m(A) =1, m(0) =0 (1) 


m(A) represents the evidence support to the proposition A. If 
m(A) > 0, A is called a focal element. 

The plausibility function (Pl) and belief function (Bel) are 
defined respectively as: 


PUA) = > pag ™(B) (2) 
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Bel(A) = ines m(B) (3) 


Dempster’s rule of combination [1], used for combining two 
distinct sources of evidence in the DST framework, is defined 
as 


0,A=90 
muamala)={ +. YS mi(B)m(C), 440 O 
BOG=A 
where K = Do pqceg ™1(B)m2(C) represents the total 


conflict or contradictory mass assignments. 

For a probabilistic decision-making based on the BBA, 
Smets defined the pignistic probability transformation [15] to 
transform a BBA into a probability measure Bet P: 


m(A) 
O;EA | A| 


BetP(0;) = > v0, EO (5) 


where |A| denotes the cardinality of A. The final decision is 
often made by choosing the element in FOD which has the 
highest BetP value. 


III. THE DETERMINATION OF BBA BASED ON INTERVAL 
NUMBERS 


In DST, the expression of uncertainty is the process of 
generating BBA. Therefore, the determination of BBA is the 
first step and crucial in the applications of DST. However, BBA 
is a kind of random set and its determination is actually the 
problem of modeling the distribution of random set, which is 
still unsolved in mathematics [3]. Kang [11] proposed a BBA 
determination method based on interval numbers (IN). The 
basis of interval numbers is briefly introduced first. 


A. Basis of interval numbers 


An interval number a in R is a set of real numbers that lie 
between two real numbers, i.e., @ = [a1, a2] = {zla, <a < 
ag}, a1, ag € R and ay < ag. 

The dissimilarity between two interval numbers a = [a1, a2] 
and 6 = [b; , b2] can be measured by the distance between them 
[16]: 


ran ff fl(292) oe 
_ (2 5) eae b)|} avay 
2 (6) 
(e)-C25) 
(Gone 


3 
The larger D(4G, b) is, the larger dissimilarity between @ and 
b is. 


+ 


B. IN-based method 


In IN-based method, the training data belonging to the same 
focal element A C © are modeled as an interval number 
@ = [a1, a2], where a; and ag are the minimum and maximum 
values of the training data respectively. For a single test 
sample, it is also modeled as a degenerate interval number 
t = [t,t], where ¢ is its value. If the test sample ¢ is similar 
to the training data a, the corresponding proposition (the test 
sample belongs to A) should be assigned a large belief. 

The similarity between @ and t is defined as: 


ee 
S(a,t) = T+ aD@,) (7) 


where a > 0 is a parameter to control the degree of dispersion 
of the normalized similarities and D(a,t) is the distance 
between the interval numbers G and ¢. Finally, the BBA can 
be generated from the normalized similarities. 

In IN-based method, when modeling the training data, only 
the minimum and maximum values are kept and used to 
calculate similarities. However, when the distribution of the 
data is not uniform, the extreme values are insufficient to well 
describe the data. Actually, any interval number is a special 
case of a fuzzy number. Other types of fuzzy numbers, such as 
triangular fuzzy number (TFN) and trapezoidal fuzzy number 
(TrEN), can keep more useful information of the data, such as 
the mean value and median. Thus, TFN and TrFN are used to 
model the data in this paper. 


IV. BBA CONSTRUCTION FROM FUZZY NUMBERS 


A. Basis of fuzzy numbers 


The generalized left right fuzzy number (GLRFN) b= 
[b1, ba, bs, ba] is a special case of a convex, normalized fuzzy 
set of the real line when its membership function is defined 


by [17I: 
L(B=t) for <a<by 
<a< 
u(x) _ 1 - for by SDSS b3 (8) 
R(ga%2) for by Sa <b 
0 else 


where L and fF are strictly decreasing functions defined on 
(0, 1] and satisfy the conditions: 


L(x) = R(x) =1 
L(x) = R(x) =0 


if «<0, 
ifa>l. ” 


The interval number is a special case of GLRFN with 
by = bg and b3 = by. The triangular fuzzy number (TFN) 
and trapezoidal fuzzy number (TrEN) [16] are two of the most 
common fuzzy numbers encountered in applications involving 
fuzzy numbers. 
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For TrFN, L(x) = R(x) = 1—«. The distance between two 
TrFNs @ = [a1, a2, a3, a4] and b = [b1, ba, bs, ba] is defined as: 


D? (a,b) 

= 1[(az + a3) — (bo + b3)]° 

+ § [(a2 + a3) — (bz + b3)| 

x ( 4—- a3 —- a2 + a, ba + b3 + bg — bi) 
(a3 — a2)? + 45(b3 — be)? 
(a3 az) [a4 a3 + ag — ai] (10) 
(b3 — b2) [bs — b3 + be — 64] 


bs) + (a2 — a1)(b2 — b1)) 


The larger D(G, b) is, the larger dissimilarity between @ and 
b is. 

For TFN, L(x) = R(x) = 1—~@ and bz = b3. The distance 
between two TFNs @ = [a1,@2,a3] and 6 = [bj, be, bs] is 
defined as: 


D?(a,b) 

= (a2 — bz)? + 3 (a2 — ba) [(ag + a1) — (63 + b1)] 
+1 [(as — a2)” + (a2 — a1)’ 

(bs — bz)” + (bo — b)’| 


[(a2 — a1)(@3 — a2) + (be — b1)(b3 
(2a2 — ay — az)(2be = by = bs) 


(11) 


Dlr OlK Olr 


be) 


The larger D(G, b) is, the larger dissimilarity between @ and 
b is. 


B. Fuzzy-number-based methods 


1) Data modeling: To generate BBAs, the fuzzy numbers 
are used to model the training data and test samples in this 
paper. For the training data belonging to A C © and the test 
sample ¢, we can use three different kinds of fuzzy numbers 
to model them: 


(1) TFNmean: the training data are modeled as a triangular 
fuzzy number @ = [a1,d2,a3], where a; and ag are 
the minimum and maximum values of the training data 
respectively and a2 is the mean value. The test sample is 

modeled as ¢ = [t, t,t]. 

TFNmed: the training data are modeled as a triangular 

fuzzy number b = [b1, b2,b3], where b; and 63 are 

the minimum and maximum values of the training data 
respectively and bz is the median. The test sample is 
modeled as ¢ = {t, t, ¢]. 

(3) TrFN: the training data are modeled as a trapezoidal 
fuzzy number € = [c1,C2,¢3,c4], where c, and cq are 
the minimum and maximum values of the training data 
respectively, cg is either the mean value or median, 


(2 


wm 


whichever is smaller and c3 is either the mean value or 
median, whichever is larger. The test sample is modeled 
as t = [t,t, t,t]. 

In these ways, besides the maximum and minimum values, 
the mean value and (or) median can be also kept to describe 
the training data. 

2) Calculate the similarities: Similar to the IN-based 
method, the similarity between the training data and test 
sample are measured from the distance between them (Eq. 
(11) for TFN or Eq. (10) for TrFN) using Eq. (7). Actually, 
other normalization functions can be used here. 

3) Generate the BBAs: The BBAs are generated from the 
normalized similarities. If the test sample ¢ is similar to the 
training data G, the corresponding proposition (¢ belongs to the 
same focal element with a) should be assigned with a large 
belief. 

In the next section, we consider the classification problem 
to verify the effectiveness of our proposed BBA determination 
method. 


V. CLASSIFICATION EXAMPLE BASED ILLUSTRATION OF 
THE PROPOSED BBA DETERMINATION METHOD 


We give a classification example on a set of artificial data 
to illustrate the process of our BBA determination method and 
verify its effectiveness. 


A. Artificial training data 


Suppose there are three classes in a set of artificial data: 
© = {0),02,03}. Each sample has three features, f1, f2 and 
fg, and each feature is correspondent to a normal distribution. 
The deviation parameters for each class are 0.25, 1 and 0.25 
respectively and the mean parameters for each feature of each 
class are given in Table I. 


TABLE I 
THE MEAN PARAMETERS FOR EACH FEATURE OF EACH CLASS 


Class | fi fe ff 
1 9 5 10 
O2 10 9 5 
03 5 10 9 


We generate 60 training data for each class. Among the 60 
samples belonging to class @,, there is an outlier whose value 
of feature f; is much larger than others belonging to class 6,. 
The generated training data are shown in Fig. 1. 

In this case, each class can be distinguished easily from 
other classes using one feature (when its mean parameter is 
5), but are difficult distinguished from other classes using other 
features. 


B. The process of classification 
For a given test sample, the process of labeling its class can 
be outlined below: 
Step 1 Generate three mass functions m1, mz and m3 accord- 
ing to the corresponding features of the training data 
respectively. 


625 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


TABLE I 
MODELING THE TRAINING DATA ON FEATURE fi 


Focal element | IN TFNmean TFNmed TrFN 
{01} [8.1, 12.6] | [8.1,9.1,12.6] | [8.1,9.0,12.6] | [8.1,9.0, 9.1, 12.6] 
{02} [7.6, 13.1] | [7.6,10.0,13.1] | [7.6,10.0,13.1] ] [7.6, 10.0, 10.0, 13.1] 
{03} [4.1,5.7] | [4.1,5.0,5.7] [4.1, 4.8, 5.7] [4.1, 4.8, 5.0, 5.7] 
{05,03} [8.1, 12.6] | [8.1,9.6,12.6] | [8.1,9.3,12.6] | [8.1,9.3, 9.6, 12.6] 
‘ belonging to this region, it is difficult to distinguish whether 
3 class 6; or 62 it should be labeled as and its belief assigned to 
3 Al 5 te rare focal element {91, 02} (mi {81, 62}) should also be considered. 
a? Thus, the training data belonging to the overlapped region are 
‘ Outlier also modeled. 
1 eee a 2) Calculate the distance between fuzzy numbers: The 
‘ # a i distances between the test sample ¢, and the training data from 
(a) Values of the training data for feature f. different focal elements are calculated using Eq. (6) (for IN), 
Eq. (11) (for TFNmean and TFNmed) or Eq. (10) (for TrFN), 
0,| as given in Table II. 
3 ar eee TABLE III 
é? coe THE DISTANCE BETWEEN THE TEST SAMPLE AND TRAINING DATA 
O > 5 econaminnamestetttt ® Focal element | IN TFNmean TFNmed = TrFN 
1 L 1 1 1 1 n 1 {01} 1.928 3.439 3.541 2.217 
+ ee ee {0} 2.134 2.552 2.529 1.838 
(b) Values of the training data for feature fo. {03} 6.875 9.660 9.738 6.862 
{01, 02} 1.928 2.977 3.265 2.040 


Ob fe AWonnsemac weet 


4 5 6 7 8 9 10 
(c) Values of the training data for feature f3. 


Fig.1. Values of the training data. 


Step 2 Combine m,, mz and m3 using Eq. (4) to obtain the 
combined mass function m. 

Step 3 Transform m into the probability measure Bet P using 
Eq. (5). 

Step 4 The class of the test sample is labeled as class 0; € O 
which has the highest BetP value. 

We take a test sample ¢ = (t1,t2,t3) = (11.8,9.8, 3.9) 
(whose class is #2) as an example to explain how to generate 
my, based on fuzzy numbers in Step | in detail. The result of 
interval-number-based method is also given for comparison. 

1) Data modeling: For feature f, the training data belong- 
ing to each focal element A € © can be modeled as an interval 
number (IN) or a fuzzy number (TFNmean, TFNmed or 
TrFN), as shown in Table I. The test sample can be modeled 
as t; = [11.8, 11.8] (IN), ¢; = [11.8, 11.8, 11.8] (TFNmean or 
TFNmed) or ¢, = [11.8, 11.8, 11.8, 11.8] (TrFN). 

In this case, the training data from class 6; has an over- 
lapped region with the data from 62. For a test sample 


In Table II, according to the IN-based method, the test 
sample is closer to {0,;} than {02}. However, without the 
outlier, the actual range of the training data from {0} is 
[8.1,10.4] and the test sample 11.8 should be assigned a 
smaller distance to {62}, whose range is [7.6, 13.1]. By only 
considering the minimum and maximum values, the IN-based 
method can easily get counterintuitive distances, especially 
when there are outliers. However, the mean value and median 
are relatively insensitive to outliers, so that the fuzzy-number- 
based methods can obtain more reasonable distances. In this 
case, the fuzzy-number-based methods assign the test sample 
a smaller distance to {02} than {0}. 

3) Calculate the similarities: The similarities between the 
test sample ¢, and the training data from different focal 
elements are calculated from the above distances using Eq. 
(7), where a is taken as 5, as shown in Table IV. 


TABLE IV 
THE SIMILARITIES BETWEEN THE TEST SAMPLE AND TRAINING DATA 
Focal element | IN TFNmean TFNmed — TrFN 
{01} 0.094 0.055 0.054 0.083 
{62} 0.086 0.073 0.073 0.098 
{63} 0.028 0.020 0.020 0.028 
{01, 02} 0.094 0.063 0.058 0.089 


4) Generate m1: my, is generated from the normalized 
similarities, as shown in Table V. Our fuzzy-number-based 
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methods assign the largest mass of belief to {62} rather than 
{61} or {61,82}, which is more reasonable compared with the 
IN-based method. 


TABLE V 
THE GENERATED ™ 
Focal element | IN TFNmean TFNmed = TrFN 
{Oi} 0.311 0.261 0.261 0.277 
{62} 0.284 = 0.345 0.358 0.329 
{63} 0.094 0.096 0.099 0.095 
{01, 02} 0.311 0.298 0.282 0.299 


In the same way, mz and m3 can be generated from feature 
fo and f3 repectively, as shown in Table VI and Table VII. 


TABLE VI 
THE GENERATED ™2 


Focal element | IN TFNmean TFNmed = TrFN 
{Oi} 0.062 0.038 0.039 0.047 
{62} 0.177 0.173 0.176 0.184 
{63} 0.380 0.378 0.375 0.378 
{62,03} 0.381 60.411 0.410 0.391 
TABLE VII 
THE GENERATED m3 
Focal element | IN TFNmean TFNmed = TrFN 
{01} 0.160 0.138 0.141 0.147 
{62} 0.484 0.547 0.537 0.522 
{63} 0.183 0.162 0.167 0.171 
{01, 03} 0.173 0.153 0.155 0.160 


After generating m1, m2 and m3, the combined mass 
function m can be obtained by using the Dempster’s rule of 
combination (Eq. (4)) and then the probability measure BetP 
can be obtained using Eq. (5), as given in Table VIII. 


TABLE VIII 
THE GENERATED BetP 


Class | IN TFNmean TFNmed = TrFN 
oO 0.065 0.026 0.027 0.038 
62 0.808 0.873 0.866 0.853 
63 0.127 0.101 0.107 0.109 


Finally, the test sample t = (11.8,9.8,3.9) is labeled as 
class 02 since it has the highest BetP value. 


VI. EXPERIMENTS 


To further compare the effectiveness of the proposed BBA 
determination methods with the IN-based method, we did the 
classification experiments on three UCI data sets (Iris, Wine 
and Wdbc). 

In each experiment, the amounts of the samples from 
different classes are equal. Among the samples from the same 
class, 60% samples are used as the training data and the rest 


40% samples are used as the test samples. We generate BBAs 
from all the features (one BBA generated from one feature) 
and the final classification result is obtained from the combined 
mass function. The value of a in Eq. (7) is set as 5. The 
accuracy of each classification is calculated from 100 runs of 
the Monte-Carlo experiments. The classification accuracies! 
are given in Table IX. 


TABLE Ix 
THE ACCURACIES OF THE CLASSIFICATIONS (%) 


Data set | IN TFNmean TFNmed = TrFN 
Tris 92.67 93.83 93.85 93.92 
Wine 91.48 93.29 94.23 92.79 
Wdbc 67.71 86.91 88.32 81.27 


From Table IX we can see, the proposed fuzzy-number- 
based methods can achieve higher accuracies than IN-based 
method. 

Furthermore, we compared the robustness of our proposed 
method with IN-based method. We add one outlier to the 
training data for each class, whose values on each feature are 
set as: 


O(fi) = max(fi) + 0.2 x (max(f;) — min(fi)) 


where max(f;) and min(f;) are the maximum and minimum 
values of the training data respectively on feature f;. The 
accuracies are given in Table X. 


(12) 


TABLE X 
THE ACCURACIES OF THE CLASSIFICATIONS WITH OUTLIERS (%) 


Data set | IN TFNmean TFNmed — TrFN 
Iris 88.72 93.08 93.07 92.13 
Wine 80.89 91.75 92.53 90.06 
Wdbc 61.80 82.87 84.28 73.69 


From Table [IX and Table X we can see, the accuracies 
of IN-based method drop significantly when the outliers are 
added while the accuracies of our fuzzy-number-based meth- 
ods drop slightly. Therefore, the proposed fuzzy-number-based 
methods are more robust for outliers than IN-based method. 


VII. CONCLUSION 


In this paper we have proposed new methods for gener- 
ating BBA based on fuzzy numbers. The experiments on its 
application of classification show that our proposed method 
is effective and robust for outliers and can achieve higher 
accuracies than the IN-based method. 

In future work, we will focus on the distance between fuzzy 
numbers. More types of distance will be used and compared to 
describe the dissimilarity between the test sample and training 
data. Other normalization functions to establish similarities 
will be evaluated, as well as other possible decision-making 
strategies. Also, other evidence combination rules will be 
tested to make comparisons. 


'The accuracy is defined as the percentage of correct classifications. 
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Abstract—Dempster-Shafer theory (DST) is an important the- 
ory for information fusion. However, in DST how to determinate 
the basic belief assignment (BBA) is still an open issue. The 
interval number based BBA determination method is simple and 
effective, where the features of different classes’ samples are 
modeled using the interval numbers, i.e., an interval number 
model is constructed for each focal element. Then, the distances 
of interval numbers are used for measuring the similarity degrees 
between the testing sample and each focal element, and the 
similarity degrees are used for determinating the BBA. The 
definition of interval numbers’ distance is crucial for the effective- 
ness of the interval number based BBA determination methods. 
In this paper, we use different interval numbers’ distances for 
determinating BBAs. By using the artificial data set and the Iris 
date set of open UCI data base, respectively, we compare and 
analyze the determination of BBAs with different distances. 

Index Terms—Dempster-Shafer theory, basic belief assignment, 
distance of interval numbers, information fusion, classification. 


I. INTRODUCTION 


Dempster-Shafer theory (DST) [1] was proposed by Demp- 
ster in 1960s, and was developed by Shafer [2]. In DST, the 
basic beliefs are assigned to the power set of the frame of 
discernment (FOD), which is used to describe the uncertain- 
ty of sources of evidence. The evidences (i.e., basic belief 
assignments, BBAs) originated from different sources can be 
fused using the Dempster’s combination rule [1]. DST has 
been widely used in the information fusion fields [3]-[5]. 

Using DST, the first step is to determinate the BBAs, which 
is still an open issue. The determination of BBAs can mainly 
categorized into two branches [6]: (1) The experts give the 
BBAs directly according to their personal experiences; (2) 
The BBAs are determinated based on the samples using some 
special determination rules. In the first branch, the determi- 
nation of BBAs relies on the experts’ subjective points of 
view. In this paper, we focus on the second branch approaches, 
i.e., the BBAs are determinated based on available samples. 
Researchers have proposed many approaches in this branch. 
Selzer et al. [3] determinated the BBAs based on the number 
of classes and the environmental weighting coefficient. Shafer 
[2] proposed a BBA determination method based on statistical 
evidences. Bi et al. [7] designed a kind of triple focal elements 
BBA in dealing with the text classification problem. Szlzen- 
stein et al. [8] used the Gaussian model getting the BBAs 


through iterative estimation. Deng et al. [9] defined a similarity 
measure based on radius of gravity, and then the similarity 
measure is used for determinating the BBAs. Boudraa et al. 
[10] and Florea et al. [11] determinates the BBAs based on 
the membership functions. Han et al. [12] proposed a method 
for the transformation of fuzzy membership function into 
BBAs by solving a constrained maximization or minimization 
optimization problem. Recently, Kang et al. [6] designed a 
BBA determination method using the interval numbers. 

Kang’s interval number based BBA determination method 
is simple and effective. Kang’s method first constructs the 
interval number [14] models for each focal element (including 
the singleton focal elements with single class and the com- 
pound focal elements with multiple classes) based on the set of 
training samples. In Kang’s method, the Tran and Duckstein’s 
[14], [16] interval number distance (TD-IND) is used for mea- 
suring the similarity degree of the testing samples compared 
with different focal elements’ interval number models. In the 
final, the similarities are normalized to get the values of BBA. 
The definitions of the interval numbers’ distances (INDs) are 
crucial for the performance of the interval number based 
BBA determination method. There exist many possible choices 
for INDs, e.g., the Gowda and Ravi’s distance [15] (GR- 
IND), the Tran and Duckstein’s distance [16] (TD-IND), the 
Hausdorff distance [17] (H-IND) and the De Carvalho’s norm- 
q distance [18] (Nq-IND). In this paper, we implement the 
Kang’s interval number based method using different INDs. 
We analyze the differences of the BBAs determinated using 
different INDs based on numerical examples. Furthermore, we 
use Monte-Carlo experiments for comparing the performances 
of interval number based methods with different INDs by 
classifying an artificial set and the iris set!. 


II. BASIC OF DEMPSTER-SHAFER THEORY 


Dempster-Shafer theory (DST) (also known as the Evidence 
Theory) is an appealing mathematical framework which can 
effectively describe the uncertainty information for the state 
of nature. In DST, the frame of discernment (FOD) is denoted 
by O = {61,02,--- ,0,}. The elements in © are mutually 


"http://archive.ics.uci.edu/ml/datasets/Iris 
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and exhaustive. The basic belief assignment (BBA) function 
assigns basic beliefs on the power set of 0, i.e., 2°. The BBA 
is also called the mass function which satifies: 


S> m(A) = 1,m() =0 (1) 
ACO 


If AC 8,m(A) > 0, A is called a focal element. 
The Belief (Bel) and Plausibility (Pl) of A are defined as: 


Bel(A) = S~ m(B) (2) 
BCA 
PI(A)= S° m(B) =1- Bel (A) (3) 
BnA=0 


The interval [Bel (A) , Pl (A)] is call the belief interval, which 
represents the uncertainty of the support degree of A. 

Different information sources can provide different evi- 
dences, i.e., the BBAs. In DST, two BBAs associated with 
two distinct sources of evidence can be combined according 
to the Dempster’s rule, as in Eq. (4). 


> Bnc=a ™1 (B) me (C) 
a ara A#FO 


0 A=06 


me (4) 


where K = So p4¢_4™(B)m(C) denotes the conflicting 
coefficient. Dempster’s combination rule is both commutative 
and associative. 

To make a probabilistic decision, the fused BBA can be 
transformed into the probability using the Pignistic probability 
transformation: 

os 


0:¢A, ACO 


TS) ,v0;,€ 0 (5) 


where |A| denotes the cardinality of A. 


III. KANG’S BBA DETERMINATION METHOD BASED ON 
THE INTERVAL NUMBERS’ DISTANCES 


Using the DST, the determination of the BBAs is the first 
step, which is an still a challenging task. Interval number, 
which can describe the uncertainty or insufficient information, 
is useful for determinating the BBAs. The definition of interval 
numbers is as follows: An interval number a in R is a 
set of real numbers that lie between two real numbers, i.e., 
@ = [a,at]) = {zla <a<at},a ,at €Randa < 
at. Kang et al. [6] proposed a BBA determination method 
based on the interval number models, where the basic beliefs 
assigned to different focal elements are determinated based 
on the interval numbers’ distances between the testing sample 
and the interval number models of focal elements. Here, we 
recall the Kang’s interval number based BBA determination 
method first. 

Kang’s method determinates BBAs on different single fea- 
tures respectively. In a single feature, Kang’s method models 
different focal elements (including the focal elements with 
single class and the focal elements with multiple classes) 
using interval numbers, and the testing sample is treated as 


a degenerate interval (a precise number) with a zero length. 
Kang’ method measures the distances between the testing 
sample and different interval number models of the focal 
elements. The testing sample should have a higher similarity 
degree with the focal element when the distance is small, and 
the corresponding focal element is assigned a higher basic 
belief. The steps of Kang’s method are described as follows: 


1) The interval number models of the focal elements with 
single class are constructed by finding the minimum 
and the maximum of the corresponding classes’ training 
samples. Then, the interval number models of the focal 
elements with mixture classes are obtained by finding 
the overlapping region of the corresponding single class- 
es’ interval number models. The interval number models 
of different focal elements are denoted by b phe 25 

2) Calculate the distances between the testing sample (de- 
noted by a) and different focal elements’ interval number 
models, i.e., D (4, by) , Vf € 2°. Note that the length 
of a is 0, Le., a 

3) Calculate the similarity degree based on the distances 
according to Eq. (6). 


=a. 


ee 1 
nia )A 1+ aD (a,b) : 


where a > 0 is the support coefficient. Empirically, it 
is proper to set a = 5 [6]. 

4) The BBA is determinated by normalizing the similarity 
degrees of all the focal elements. 


Kang’s method define the similarity degrees using interval 
numbers’ distance, and the BBAs are obtained by normalizing 
the similarity degrees. Thus, the definition of the IND (..e., the 
D(a, by ) is crucial for this method. The differences of the 
BBAs determinated by Kang’s method using different INDs 
are compared in the next section. 


IV. COMPARISONS OF INTERVAL NUMBER BASED BBA 
DETERMINATION METHOD USING DIFFERENT INDS 


As aforementioned, the definition of the IND is crucial for 
the interval number based BBA determination methods. Many 
INDs have been proposed. Here, we introduce four widely 
used INDs. 


A. Introduction of the interval number’s distances 


Suppose G@ = [a~,a*t] and b = [b~,b*] are two interval 
numbers. Then [13], [14], € = @@6 = [c,c*], where 
c- = min(a~,b~) and ct = max(at,b*). The length (or 


width) of the interval number @ is (a) = at —a~. Da 
is the length of the domain [14] of the interval numbers. To 
measure the difference between two interval numbers, many 
interval numbers’ distances (INDs) have been proposed. Here, 
we introduce four widely used INDs, which are introduced as 
follows: 
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Gowda and Ravi (1995) [15]: In 1995 Gowda and Ravi 
proposed a metric (denoted by GR-IND) combining a position 
and a size component, as follows 


Dex (a, b) =; (a, b) LD; (a, b) (7) 


where the position component is defined as, 


o(ei)ene[(-IGh!) 3] 


and the size component is defined as 


: (a) +p (d 
D, (a,b) = cos () I (9) 
2x (a ® b) 2 
Tran and Duckstein (2002) [16]: In the framework of 


fuzzy data analysis, Tran and Duckstein proposed the interval 
numbers’ distance (TD-IND): 


vio (68) = [1 f' (or eer seta] 


1 
2 


fhe +b) +y (bt -b J] dea (10) 


= qlla + 0%) - (6 +0")]? 
+35 [at 07) + O07) 


Hausdorff distance [17]: Considering two sets A and B 
of points of R”, and a distance d(x,y), where x € A and 
y € B. The Hausdorff distanc (H-IND) is defined as follows: 


Dy (A, B) = max (sup inf d(z,y), sup inf a(ew)) 
cE AYEB yeB reEA 
(11) 
If d(x, y) is the Manhattan distance (also called the City block 
distance), ie., d(x,y) = |x — y|, then Chavent et al. (2002) 
proved that 


Du (@,b) =max(ja~ — "| ,a* -b*]) 2) 


De Carvalho et al. (2006) [18]: A family of distances 
between interval numbers has been proposed by De Carvalho 
et al. based on the bounds of interval numbers. The metric of 
norm-q (Nq-IND) is defined as: 


Dw, (4,6) = (la~ - or |" + Jat — BY") 


(13) 


B. Numerical example 


Different INDs can be used for implementing the BBA 
determinations. Here, we use a numerical example for com- 
paring the interval number based BBA determination methods 
using different INDs. The BBA determination methods using 
different INDs are applied on a three-classes classification 
problem. In this numerical example, we give the features’ 
ranges of different classes directly, as shown in Figure 1, where 
the feature’s range of class 1 (0) is [1,4], class 2 (82) is [3, 7] 
and class 3 (63) is [5, 8]. 

From the Figure 1, the interval numbers models of focal 
elements can be constructed, which is listed in Table I. Note 
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Fig. 1. Feature values’ ranges of different classes 


TABLE I 
THE INTERVAL NUMBERS MODELS OF FOCAL ELEMENTS. 
Focal elements | Interval number model 
{01} (1, 4] 
{62} [3, 7] 
{63} [5,8] 
{01,02} [3, 4] 
{02,43} [5,7] 
{01, 43} N/A 
{01, 02,63} N/A 


that in this example {01,03} and {61, 02,63} do not have 
interval number models, because the {6 }’s and {63}’s interval 
number models do not have overlapping region. 

Suppose we have a testing sample whose feature value is 2, 
i.e., @ = [2,2], as the purple dot on X-axis of Figure 1. Then 
we use different INDs, i.e., the GR-IND as in Eq. (7), the TD- 
IND as in Eq. (10), the H-IND as in Eq. (12), and the Nq-IND 
as in Eq. (13) (with gq = 2 in Nq-IND), for measuring the 
distance between the a and different focal elements’ interval 
number models, respectively. The distances are listed in Table 
Il. 


TABLE II 
THE INDS BETWEEN THE @ AND FOCAL ELEMENTS’ INTERVAL NUMBER 

MODELS. 

Focal elements | GR-IND | TD-IND | H-IND | Nq-IND 

{61} 0.9296 1.0000 2.0000 | 2.2361 

{62} 1.0315 3.2146 5.0000 | 5.0990 

{63} 1.5474 4.5826 6.0000 | 6.7082 

{01,2} 1.1464 1.5275 2.0000 | 2.2361 

{62,03} 1.5745 4.0415 5.0000 | 5.8310 


Then, using the distances the similarity degrees are calculated 
according to Eq. (6), where the support coefficient is set to 
a = 5. By normalizing the similarity degrees the BBAs are 
obtained as listed in Table IIL. 

As the BBAs in Table III, the basic beliefs assigned to 
different focal elements have small differences using GR- 
IND compared with that using TD-IND, H-IND and Nq- 
IND. For example, using GR-IND the basic beliefs assigned 
to {0;} and {02} are 0.2552 and 0.2305, which have small 
differences. Using TD-IND, the basic beliefs of {0} and {02} 
are 0.4086 and 0.1289, whose difference is larger. The BBAs 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


TABLE III 
THE BBAS DETERMINATED BASED ON DIFFERENT INDS. 


BBAs 
Focal elements 
GR-IND | TD-IND | H-IND | Nq-IND 

{61} 0.2552 0.4086 0.3184 | 0.3163 
{62} 0.2305 0.1289 0.1281 0.1394 
{63} 0.1546 0.0906 0.1069 | 0.1061 
{61,02} 0.2078 0.2693 0.3184 | 0.3162 
{62,43} 0.1319 0.1026 0.1282 | 0.1220 


determinated based on H-IND and Nq-IND are similar to each 
other. 

Here, we use the Pignistic probability transformation (as in 
Eq. (5)) for transforming the BBAs to probabilities for decision 
making. The probabilities of the testing sample belonging to 
different classes are listed in Table IV. 


TABLE IV 
THE PIGNISTIC PROBABILITIES OBTAINED BASED ON DIFFERENT INDS. 
Classes Pignistic probabilities 
GR-IND | TD-IND | H-IND | Nq-IND 
Class 1 (01) | 0.3591 0.5433 0.4777 | 0.4744 
Class 2 (82) | 0.4103 0.3148 0.3514 | 0.3585 
Class 3 (83) | 0.2306 0.1419 0.1709 | 0.1671 


Intuitively, the testing sample belongs more likely to class 
1, as shown in Fig. 1. According to Table IV, the methods 
using the TD-IND, H-IND and Nq-IND all can make right 
classifications. According to the probabilities originated from 
the GR-IND, the testing sample should be classified to class 
2. Revisiting the BBA determinated based on GR-IND, the 
basic beliefs assigned to the focal elements with single class 
has the right tend, ie., m({01}) > m({O2}) > m({O3}). 
However, the Pignistic probabilities originated from the GR- 
IND is counter-intuitive, where the beliefs assigned to the focal 
elements with multiple classes are counted together. From this 
perspective, the BBA determinated based on GR-IND is not 
so good. In this numerical example, the interval number based 
methods using the TD-IND, H-IND and Nq-IND perform more 
proper for the BBA determination than that using the GR-IND 
if the decision-making is based on max of BetP. 


V. EXPERIMENT 


To compare the interval number based BBA determination 
method using different INDs, we use Monte-Carlo experiments 
on the classification of the artificial set and the iris set. The 
information fusion based classification is implemented as fol- 
lows. In each classification, the interval number based method 
is used for determinating the BBA in each single feature. 
Then these multiple BBAs are combined using Dempster’s 
combination rule as in Eq. (4). Then the combined BBA 
is transformed into probabilities using Pignistic probability 
transformation as in Eq. (5). The testing sample is classified 
as the class which has the largest Pignistic probability. 

In the experiment, the interval number based methods using 
different INDs are used for determinating the BBAs respec- 


tively. In the Nq-IND, we have taken g = 2. The parameter 
qa in the generation of the similarity degrees in the interval 
number based BBA determination method (as in Eq. (6)) is set 
to 5. The Monte-Carlo classification experiments are repeated 
100 times with random testing samples. The effectiveness of 
the interval number based BBA determination methods using 
different INDs are compared using the average accuracy of 
the 100 runs. 


A. Experiment on artificial set 


The artificial set generated contains 3 classes. Each class has 
50 samples, and each sample has 3 features. The features of 
different classes are generated according to Gaussian distribu- 
tion, i.e., G (u, Oo Ms The standard deviations (a) of different 
classes’ different features are all set as o = 1. The mean 
(1) settings of different classes’ different features are listed in 
Table V. 


TABLE V 
THE MEAN (2) SETTINGS OF DIFFERENT CLASSES’ DIFFERENT FEATURES. 
Classes Mea) 
Feature | | Feature 2 | Feature 3 
Class 1 (61) | 8 5 10 
Class 2 (82) | 10 9 6 
Class 3 (83) | 5 11 9 


The features of different classes in the artificial set we 
generated are shown in Figures 2-4. 


Feature 1 
Class 3 RCH IIR 3 
Class 2 iia inate sae 
Class 1 “+ -HHE HH SHEE 
2 4 6 8 10 12 14 


Feature values 


Fig. 2. Artificial samples’ feature 1 of different classes. 


Feature 2 
Class 3 WOSPHIRICHRIOMIMIGH AK 3 
Class 2 to cee em emenmes + 
Class 1 th HHH HH Ae 
o 4 6 10 12 14 


Feature values 


Fig. 3. Artificial samples’ feature 2 of different classes. 


As shown in Figures 2-4, the class 3 is linearly separable 
from class | and class 2, and class 1 and class 2 are not linearly 
separable from each other in feature 1. Similarly, class 2 and 
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class 3 are not linearly separable from each other in feature 2, 
and class 1 and class 3 are not linearly separable from each 
other in feature 3. 


Feature 3 
Class 3 2K RRRMNHIRRERHIOK ok 
Class 2 ame meres t-te 
Class 1 AHHH HH 
2 4 6 8 10 12 14 


Feature values 


Fig. 4. Artificial samples’ feature 3 of different classes. 


In each Monte-Carlo run, we randomly select 25 samples 
from each class (75 samples in total) as the set of training 
samples, and the remaining samples are used as the testing 
samples. We first classify the testing sample according to the 
BBA determinated based on each single feature, respectively. 
Then, we combine the BBAs determinated based on the 3 
features, and use the combined BBA for classifying the testing 
sample. The results of the methods based on different INDs 
are listed in Table VI. 


TABLE VI 
THE RESULTS OF THE METHODS BASED ON DIFFERENT INDsS. 


INDs Classification correct rate (%) 

Feature 1 | Feature 2 | Feature 3 | Combined 
GD-IND | 44.70 64.86 42.62 80.95 
TD-IND | 67.71 84.13 61.66 94.84 
H-IND 64.66 80.24 56.01 89.66 
Nq-IND | 65.86 81.68 55.84 91.97 


In Table VI, the columns “Feature 1”, “Feature 2” and “Feature 
3” are the results of the methods using different INDs based on 
each single features. The column “Combined” are the results 
obtained by combining the BBAs determinated on different 
features with Demspter’s rule of combination. According to 
Table VI, the classifications of the methods using different 
INDs based on each single feature does not perform well. 
However, the BBAs determinated based on different features 
reflect different aspects’ information of the samples. By fusing 
the BBAs based on different features, better classification 
performances are obtained. Comparing the results of the 
methods based on different INDs, the method based on GD- 
IND performs the worst. The performances of the methods 
based on TD-IND, H-IND and Nq-IND are similar, where the 
one based on TD-IND is the best. The BBA built using the 
GD-IND is not recommended for the BBA determination. 


B. Experiment on iris set 


The iris set contains 3 classes. Each class has 50 samples, 
and each sample has 4 features. In this experiment, we 
randomly select different numbers of samples as the training 
samples (the number of the samples selected from different 


classes are the same), and all the samples are used as the 
testing samples. The results of the interval number based BBA 
determination methods based on different INDs are shown in 
Figure 5. 


Classification accuracy 
oss so 90 
& o 4 
| 
| 


o2t —— GD-IND| 
—+— TD-IND 
H-IND 

Ng-IND. 


boii 1 1 1 1 1 1 1 
691215 30 45 60 75 90 105 120 135 
Number of train samples 


Fig. 5. Performances of the interval number based methods using different 
INDs with different scales of training samples on iris data set. 


According to Figure 5, the methods using TD-IND, H-IND 
and Nq-IND perform well in both the cases with small number 
of training samples and large number of training samples. The 
method using TD-IND performs the best compared with the 
methods using other three INDs. The results of the method 
using GD-IND have a counter-intuitive behavior, since its 
accuracy decreases with the increasing of the number of the 
training samples. When the number of training samples is 
large, the interval numbers generated can better model the 
features of corresponding classes, especially, for the mixture 
classes’ focal elements (i.e., the overlapping range of corre- 
sponding classes’ interval number models). However, as dis- 
cussed in the numerical example in section IV-B, the interval 
number based method using GD-IND is not recommended for 
determinating the BBA, especially, counting the mixture class 
focal elements together. That is why the method using GD- 
IND performs bad when the number of training samples is 
large. 


VI. CONCLUSION 


In this paper, we have tested different INDs for implement- 
ing the interval number based BBA determination method. The 
effectiveness of the BBAs are compared based on the infor- 
mation fusion based classification problems. The experiments 
validate that combining the BBAs determinated using interval 
number based methods with different INDs performs well 
for the classification problems. The methods using the TD- 
IND, H-IND and Nq-IND provide quasi similar performances, 
where the one using TD-IND is the best one. Using the 
GD-IND, the basic beliefs construction is not very effective. 
With GD-IND, the differences of the basic beliefs assigned to 
different focal elements are small, which is not discriminant 
enough for making decisions, especially, counting the mixture 
classes’ focal elements. Therefore, the method using the GD- 
IND is not recommended. 

Up to now, the interval number based BBA determination 
methods are implemented on the single feature. In future work, 
we will try to use the interval numbers for determinating 
the BBAs on the multiple features spaces, and compare the 
effectiveness of the ones using different INDs. We will explore 
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also different decision-making strategies (i.e. DSmP, min of 
d_BI, etc.), and test other rules of combination as well to see 
if we can improve classification performances. 
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Abstract—One of the challenges of remote sensing image based 
building change detection is distinguishing building changes from 
other types of land cover alterations. Height information can be 
a great assistance for this task but its performance is limited to 
the quality of the height. Yet, the standard automatic methods 
for this task are still lacking. We propose a very high resolution 
stereo series data based building change detection approach that 
focuses on the use of time series information. In the first step, 
belief functions are explored to fuse the change features from 
the 2D and height maps to obtain an initial change detection 
result. In the second step, the building probability maps (BPMs) 
from the series data are adopted to refine the change detection 
results based on Dempster-Shafer theory. The final step is to fuse 
the series building change detection results in order to obtain a 
final change map. The advantages of the proposed approach are 
demonstrated by testing it on a set of time series data captured 
in North Korea. 


Keywords: Change detection, belief functions, DST, DSM.. 


I. INTRODUCTION 


Building change detection is one of the fundamental 
remote sensing research topics. Although many approaches 
are available, it is still very difficult to select one standard 
approach that works for all situations. Especially along with 
the improvement of the image resolution, besides building 
changes many irrelevant changes may also be visible in the 
remote sensing images, which makes the 2D building change 
detection more challenge. 3D building change detection has 
gained a great attention and is able to provide more accurate 
results. Due to the unprecedented technology development of 
sensor, platforms and algorithms for 3D data acquisition and 
generation, the 3D data become more accessible than before. 
Stereo time series data will allow a better understanding of 
the building change types and further increase the change 
accuracy. 


Many research works have proved the advantages of in- 
troducing Digital Surface Models (DSM) to building change 
detection [1]—[2]. However, the performance of the 3D change 
detection approaches rely heavily on the quality of the DSMs. 
And the DSMs from satellite images do not always provide 
reliable height information, due to the occlusion and matching 
errors. In the case of large regions have incorrect height values, 
it is very difficult to avoid false detections. In our previous 
research, time-series information worked well to to improve 
the building detection results. In this research, we will further 
adopt this information to improve the change detection results. 
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In paper [3], the belief functions introduced in the 
Dempster-Shafer Theory (DS) [4]-[5], and extended in Dezert- 
Smarandache Theory (DSmT) [6] were used to deal with 
the uncertainty information delivered from the DSMs. In 
[3] the possibility of using Dempter’s fusion rule and the 
Proportional Conflict Redistribution Rule #6 (PCR6) of DSmT 
in our application were tested. Though improvements have 
been proven by comparing with the method stated in [7], the 
results delivered under DS and DSmT frameworks were rather 
similar. Therefore, in this paper, only the DS fusion rule is 
used to get an initial change detection result. 


This paper is organized as follow: firstly, the belief func- 
tions and building change detection fusion models are briefly 
reviewed. Then, the series image based fusion model together 
with the building extraction method are introduced. In the end, 
these refined fusion models are tested on the satellite real data. 


II. DS BELIEF FUNCTION BASED BUILDING CHANGE 
DETECTION 


A. Basics of DST 


Dempster-Shafer fusion (DST) is one of the fundamental 
decision fusion theory. It allows the combination of evidence 
from individual experts or any data sources. The general 
introduction of of DST can be found in [4], [6], and [8]. 


Let © be a frame of discernment of a problem under 
consideration. 0 = {61,62,...,4n} consists of a list of N ex- 
haustive and mutually exclusive elements 6;, 7 = 1,2,...,.N. 
Each 6; represents a possible state related to the problem we 
want to solve. The assumption of exhaustivity and mutual 
exclusivity of elements of © is classically referred as Shafer’s 
model of the frame O. A basic belief assignment (BBA) also 
called a belief mass function (or just a mass for short), is a 
mapping m(.) : 2° — [0,1] from the power set! of © denoted 
2° to (0, 1], that verifies [4]: 

m(0)=0, and S> m(X)=1. (1) 


XE2° 


m(X) represents the mass of belief exactly committed to 
X. An element X € 2° is called a focal element (FE) if 
and only if m(X) > 0. In DST, the combination (fusion) 
of several independent sources of evidences is done with 
Dempster-Shafer? (DS) rule of combination, assuming that 


'The power set is the set of all subsets of ©, including empty set. 
? Although the rule has been proposed originally by Dempster, we call it 
Dempster-Shafer rule because it has been widely promoted by Shafer in DST. 
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the sources are not in total conflict?. DS combination of two 
independent BBAs m,(.) and ma(). denoted symbolically 
by DS(m1,mz), are defined by m?°(0) = 0, and for all 
X € 2° \ {0} by: 


1 
me? (XS) = [-KDSs = my(X1)m2(X2), (2) 
X1,X2€2° 
X{NX2e=X 


where the total degree of conflict K?* is given by 


KPS A ys 


X1,X2€2° 
X1NX2e=0 


m4 (X1)m2(X2). (3) 


B. Building change detection 


1) Choice of the frame of discernment: As noted above, 
the accuracy of 2D change detection is limited due to the 
misdetections caused by irrelevant changes. These irrelevant 
changes have a greater effect on very high resolution (VHR) 
images since more detail on land-cover objects is visible. To 
solve this problem, in the decision fusion based 3D change 
detection framework, three classes have been considered. They 
are, 


© = {6, = Pixel € BuildingChange, 
62 = Pixel € OtherChange, (4) 
63 = Pixel € NoChange}, 


and 
6,1 02N 03 = S. (5) 


Based on the three classes, the set of focal elements FE that 
are of interest in our application is: 


FE = {0;,62, 03,01 U 02, 02 U 03,0; U8, U3}. (6) 


Two change indicators, one from images and one from 
DSMs were involved in the fusion model. Changes from spec- 
tral images are highlighted by using the Iteratively Reweighted 
Multivariate Alteration Detection IRMAD) [9]. Consequently 
height changes from DSMs are shown after robust height 
differencing [7]. 


2) BBAs construction: In [3], the sigmoidal model for 
both concordance and discordance indexes are constructed 
by projecting the change values to a sigmoid curve f,,r. T 
represents the symmetry point of the sigmoid curve, while the 
T control the slope of it. The concordance index measures the 
concordance of change indicator and BBA in the assertion, 
while the discordance measures the opposition of change 
indicator to the BBAs in the assertion. The symmetry point 
of the concordance and discordance sigmoid curves can be 
automatically calculated with multi-level thresholding method 
proposed by Otsu [10]. 


Thus, using height change index as example, the BBAs 
for discordance and concordance height change index are 
functions of values aay and bay defined by 


aan = frm (AH), and ban = f-r,7,(AH). (1) 


3Otherwise DS rule is mathematically undefined because of 0/0 indetermi- 
nacy. 


636 


The factor 7 could be calculated with a sample value 
(AH = 1, aay = 0.1), which means 1 meter height change 
indicates 10% probability to be building changes. The BBAs 
for discordance and concordance image change index are 
built similarly. Differences appearing in 2D images give a 
concordance indication for all changes, which include the 
building changes and other changes (6; U 62). In this paper 
the changes from images are named Almg. 


In [3], the fusion models have been described in detail. 
Here we only explain the fusion model of the height changes 
as an example. In Table I the construction of the BBAs from 
the sources of evidence based DS rule of combination for 
the height change indicator (i.e. the first source of evidence). 
In Table I, m4(.) and m/{(.) represent the concordance and 
discordance BBAs from AH. 


TABLE I. BBA CONSTRUCTION FOR HEIGHT CHANGE INDICATOR 
[Kan = canban] 
Focal Elem. my4(.) mi(.) mP () 
a T—6 
1 GAH tal ai 
02 0 0 0 
03 0 0) 0 
0, U Ag 0 0 0 
T—a B 
02 U 03 0 ban ve 
0, U0, U @3 1—aan 1— ban Se re 


III. TIME SERIES FUSION MODEL 


To further improve the accuracy of the change detection 
map, the pre- and post-event building probability maps are in- 
troduced to the decision fusion model. The building probability 
maps are prepared using our previous research results. 


A. Time-series based building probability map extraction 


The building extraction method based on spatiotemporal 
inferences are adopted to prepare the building probability 
maps (BPM) [11]. The approach is mainly composed of three 
steps: (1) training sample selection; (2) feature extraction and 
classification; (3) spatiotemporal based BPM refinement. 


1) training sample selection: Training sample selection is a 
time consuming and tedious process, which should be avoided 
for the automatic image processing chain. In this step, we 
are trying to produce the training data automatically from 
history database. More precisely, only one set of training data 
containing of building, ground&road, shadow and trees was 
manually annotated. Training data for the images captures of 
other dates can be automatically generated by using a decision 
based change detection approach. The normalized DSM is 
used to separate the above ground object from ground and 
and road. And the height changes, shadow index changes and 
the normalized difference vegetation index (NDVI) changes 
are used to generate a coarse change maps, thus update the 
training data. More details are described in paper [11]. 


2) feature extraction and classification: Based on the train- 
ing data, the Random Forests (RF) [12] supervised classifier 
was adopted. The features extracted for the classification task 
include: 1) Principal component analysis [13] transformation 
components of the multispectral channels; 2) Differential 
morphological profile [14] of the panchromatic image; 3) 
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Normalized DSM. The RF classifier took the normalized 
feature vectors as input for a pixel-based classification. Besides 
the final classification results, the confidence values of each 
class label were given to each pixel. Thus a BPM map were 
generated for each dataset among the time-series images. 


3) Spatiotemporal based BPM refinement: As it was men- 
tioned in [11], the BPMs across all dates may be not consistent 
due to some potentially imprecise training samples, or un- 
wanted objects on optical images, such as cloud/snow covered 
regions. In addition, as a general drawback of the pixel-based 
classification approaches the salt-and-pepper effect exists in the 
results. Thus a consistency check through spatial and temporal 
domain would be helpful to improve the final result. The basic 
idea of this approach could be explained through Eq. (8). 


1 
PHEW Salat)” 
m=rtln=ytl h 


S> SS Sowlmn,k)P(m,n,t), 8) 


m=2—-ln=y—-lk=1 


where P(m,n,t) is the BPM at time t. The refined BPM is 
recorded as Py (x, y,t). A window size (2 x 1+1)? is used for 
the spatial consistence check. We have used / = 7 in [11]. h is 
the number of temporal data set. w(m, n, k) is the 3D adaptive 
kernel, which aims to balance the similarity and distance of 
the neighboring pixels in three dimensions. 


B. DS based change map refinement 


One of the main advantages of DST lies in the handling 
indicators from various sources flexibly. Benefit from the 
previous steps, the refined BPMs from each time would be 
delivered. And a building change probability map can be 
calculated by using the approach from Section I. Based on 
the building change detection approach, we obtain a building 
change probability map in which all pixels represent a proba- 
bility that pixel were classified as building change. Thus, when 
comparing two datasets the available indicators would be, 


e pre-event BPM (Ppre) 
e post-event BPM (Pa fter) 
e initial building change probability map (Pgc) 


four To model this situation more precisely, we categorize 
the change situations into four groups, which are buildings 
to buildings (BB), non-building to buildings (NB), buildings 
to non-building (BN) and non-building to non-building (NN). 
Based on these four classes and the indicators, the FE set that 
are of interest in this fusion model is, 


FE = {BB,NB,BN,NN,BBUBN, BBUNB,®}. (9) 


The probability masses {P,, P2,P3} obtained respectively 
from these three indicators are assigned to FE as shown in 
Table. II. One of the basis principles is that all newly built 
buildings should have a lower value in the pre-event BPM, 
and a higher value in the post-event BPM. Based on the DS 
fusion rule, the fused masses are listed in the last column. 
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TABLE II. DS FUSION MODEL FOR RESULT REFINEMENT 
[Kk = P, * P3). 
FE Pore | Pafter | Pac Fused mass 
BB 0 0 0 Pyshoe= 23) 
(l= Pj) *P2 
a q Ps =r 
BN 0 0 0 
NN 0 0 0 0 
BBUBN | P, 0 0 Pr == P3)*(—P3) 
BBUNB 0 P» 0 C= P) )* P2*(— P3) 
8 IP, | 1-P2 | 1-P; | G-*v=C=tgC"*s) 


C. Time-series fusion model 


The previous fusion steps are performed to each multi- 
temporal data pair separately. They can be then combined in 
the time-series fusion model. In this section, we use three 
datasets captured from three dates as an example to describe 
our fusion model. Three datasets are notated as d;, dz and d3, 
respectively. They are arranged according to the acquisition 
time. d, is the oldest dataset. Then the building change 
detection (BC) outcomes among these datasets can be recoded 
as BC\2, BC23 and BC\3, respectively. 


By referring to the fusion model in Table II, a global mass 
{MBB,MvB,MBN,MNN;MBBUBN; NBBUNB; Me} can be 
obtained. We will transform the global mass to a three-classes 
FE mass on NB, NB, and © by coarsening the original set of 
focal elements. For this, we apply the following transformation 
which can be seen as a partial pignistic transformation. In the 
pignistic probabilities [4], global masses of joint elements are 
averagely redistributed to each class. Since the full ignorance 
© is one focal elements in the three-classes FE, we will keep 
this value to m(Q) and only take the partial ignorance when 
calculating the m/’(.) as shown in Eq. (10). 


1 
m'(NB) =mnp+=MpBuns, 


2 
il 1 
m' (BB) =mge + =MpeBunB + ~™BBUBN; 
2 2 
1 (10) 
m BN) =mpBn + gM BBUBN, 
m'(NN) =0, 
m'(®) = m(0) 


To clarify the notation, we use a, b and c to represent 
the mass values for NB, NB and ©, respectively. With 
m'(NN) = 0, they can be calculated with Eq. (11). Thus 
the building change detection results from N By2, N Bo3 and 
N B13, denoted as Py p12, Pypo3 and Py p13, can be fused 
according to the fusion model shown in Table HI. 


TABLE III. THE INDICATORS FOR TIME SERIES FUSION MODEL. 

FE. Pnei2 | Pne23 | Pneis 
Cy Pia 0 0 
Co 0 Poa 0 
C3 0 0 Ps, 

Ci UC2 0 0 P3a 

C2U C3 Pip 0 0 

Cy UC3 0 Pop 0 
(c) Pre Pre Pc 


In this fusion model, the focal three change classes would 
be, changes happened between d, and dz, notated as C;; 
changes happened between between dz and ds, notated as C2; 
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and no-building change, notated as C3. 
a=m' (NB), 
c=m'(9), (11) 
b=m' (NB) =1-—m'(NB)—m'(0). 


With the conjunction rules, the following belief masses will 
obtained, 


MC) = PiaPecP3a + PiaPocP3c + Pia Pop Pa 
+ Pia Pon P3¢ + PicPapP3a; 
m(C2) = Pip Poa P3a + Pip PoaP3e + Pip Pac P3a 
+ PiePoaP3a + PicPoaP 3c, 
m(C3) = Pip Pop P35 + Pip P2cP3p + PicPoy P30 
+ Pic PrceP3p + Pry Pop P3c, 


b 


oO 


(12) 


m 0) = PicPrcP3e, 
k=1- m(C};) = m(C) c= m(C3) = m(Cy U C2) 
-— m(C, U C3) _ m(Cy U C3) a m(O). 


in which x represents the mass of conflict. 


Based on the DS fusion rule, the final mass will be 
calculated by 
m(X) 
1-— kK’ 


mps(X) = (13) 


for X € {mBB,MNnB,MBN,MNN,MBBUBN, MBBUNB, Moe}, 


and X 4); and mpg(@) = 0. 


IV. EXPERIMENTS 


The improved building change detection fusion models 
have been tested on satellite images. The datasets and the 
experiments are described in this section. 


A. Datasets 


The experimental datasets consist of five pairs of IKONOS 
and one pair of GeoEye-1 stereo imagery captured from 2006 
to 2011. The detailed capture dates of these data are shown 
in Table IV. The true color images of the earliest and latest 
datasets are shown in Fig. | (a) and 1 (b), respectively. Within 
these five years, many new buildings are constructed in this 
test region. As a data preparation procedure, DSMs have been 
generated based on the method explained in [15]. 


TABLE IV. TIME SERIES DATASETS DESCRIPTION. 
No. | Satellite | Capture date | Resolution (m) 
PAN MS 
1 | IKONOS | 23-02-2006 1 4 
2 | GeoEye-1 | 20-12-2009 | 0.5 2 
3 | IKONOS | 12-01-2010 1 4 
4 | IKONOS | 13-05-2010 1 4 
5 | IKONOS | 07-01-2011 1 4 
6 | IKONOS | 02-05-2011 1 4 


The sub-pixel co-registration among these data is per- 
formed based on the camera model parameters correction [11]. 
The radiometric co-registration method is described in [7]. 
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B. Results and evaluation 


In the first change detection step, the data from 2006 are 
used as the pre-event test data. The rest five datasets are the 
post-event dataset. Thus five change detection case studies 
are prepared and named as Cog_o9, Cog—1001, Coe—1005> 
Cos—1101, Coe—1105, correspondingly. The change indicators 
from DSMs and images are detected respectively by using 
robust height differences and IRMAD. The change maps are 
recorded as Hy; + ¢ and Imga;s ¢. At the same time, BPMs from 
all six datasets are calculated and refined with the approach 
described in section III-A. Then the change detection result 
between each pairs of datasets is refined by using the pre- 
event and post-event BPMs. 


To evaluate quantitatively the performances of the different 
fusion approaches, the extracted BBAs from both approaches 
(original and refined) are compared to the manually extracted 
change reference masks. The results are analyzed in terms of 
Receiver Operating Characteristic (ROC) curve [16]. A larger 
area under the ROC curve (AUC) indicates a better accuracy 
of the building change map. The numerical evaluation results 
are described in Table V. The obtained AUC values prove 
an obvious accuracy improvement after the proposed fusion 
model is applied. The m’(NV B) is used as the first-step change 
detection results, and listed as Re fined. 


TABLE V. BUILDING CHANGE DETECTION ACCURACY COMPARISON. 
Change maps | Cos—o9 | Coe—1001 | Coe—1005 | Coe—1101 | Coe—1105 
Aaisy 0.9267 0.9233 0.9016 0.8289 0.8211 
Imgaif f 0.9049 0.5937 0.9004 0.8283 0.8610 
Fusion 0.9540 0.9271 0.9474 0.8885 0.8862 
Refined, 0.9771 0.9744 0.9668 0.9241 0.9442 


In the time-series fusion model, we have tested the data 
from 2006, 2009 and 2010 May as a test combination. The 
further improved building change probability map (Cos—o9) 
with (AUC=0.9795) is delivered. The differences between the 
original change detection result and the refined one can be 
observed in Fig. 2. 


V. CONCLUSIONS 


Detecting building changes is an important but difficult 
topic. Many approaches have been proposed for specific build- 
ing types or for certain types of data sets. In addition, the image 
quality and the existing of some unwanted objects may also 
influence the effectiveness of some approaches. Our previous 
research has evidenced the performance of the belief functions 
in DSM assisted change detection [3]. In this paper, we have 
further explored in more detail the belief functions for building 
change detection. Time-series data were used for this purpose. 
They were firstly adopted to provide the BPMs after checking 
the temporal consistency for each date. Then, pre- and post- 
event BPMs are used to improve the accuracy of the DS-based 
building detection result. In the last step, the time-series change 
detection results can be fused again according to the DS fusion 
rule, which results a further improved building change map. 
However, in the time-series fusion model, only three sets of 
data are involved. As part of our further work, this fusion 
model will be further refined that may accommodate more 
datasets as inputs. 
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Fig. |. The true color images of the first (a) and sixth (b)experimental dataset and (c) the change reference map (Blue: built before 2009; Orange: built before 


January 2011; Red: built before May 2011) 


Fig. 2. The building change detection results between 2006 and 2009 based 
on (a) inital fusion model (b) time-series fusion model. 
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Abstract—In belief functions related fields, the distance mea- 
sure is an important concept, which represents the degree 
of dissimilarity between bodies of evidence. Various distance 
measures of evidence have been proposed and widely used in 
diverse belief function related applications, especially in perfor- 
mance evaluation. Existing definitions of strict and non-strict 
distance measures of evidence have their own pros and cons. 
In this paper, we propose two new strict distance measures of 
evidence (Euclidean and Chebyshev forms) between two basic 
belief assignments (BBAs) based on the Wasserstein distance 
between belief intervals of focal elements. Illustrative examples, 
simulations, applications and related analyses are provided to 
show the rationality and efficiency of our proposed measures for 
distance of evidence. 


Keywords: distance of evidence, belief functions, evidence 
theory, dissimilarity, belief interval. 


I. INTRODUCTION 


The theory of belief functions, also called Dempster-Shafer 
evidence theory (DST) [1], is an important mathematical 
framework for uncertainty modelling and reasoning. It has 
been applied to information fusion [2], pattern recognition [3] 
[4], multiple-attribute decision making [5], fault diagnosis [6], 
etc. DST has some limitations, see [7]-[9] for discussions. 
Generalized or refined theories were proposed including trans- 
ferable belief model (TBM) [10] and Dezert-Smarandache 
Theory (DSmT) [7], [11], etc. 

In DST, the basic belief assignment (BBA) is a common 
way for modeling (epistemic) uncertainty. The distance of 
evidence is a crucial metric for measuring the distance between 
two BBAs. It indicates a BBA is “far” from or “close” to 
another one. In many belief functions related applications, 
the distance of evidence is required. Such belief function- 
related applications can be categorized into two types. The 
first type is the performance evaluation or optimization [12]— 
[16]. For example, in the performance evaluation of BBA 
approximation [16], which aims to simplify the BBA to reduce 
the computational complexity, the distance of evidence is 
needed to measure accuracy of an approximated BBA (the 
one closer to the original BBA is better). Furthermore, some 
BBA approximation approach is directly based on the distance 
minimization [17], therefore, the distance of evidence is indis- 
pensable. The second type of applications is to determine the 
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agreement between sources of information. For example, in 
the clustering analysis [4], [18], [19] and the determination 
of discounting factors [20], [21], the distance of evidence is 
required. 

Since the distance of evidence is a very crucial concept in 
many applications, it has attracted increasing research interest 
recently in the belief functions community. Many definitions 
of distance (or dissimilarity) measures have been proposed in 
the past two decades [22]. Some of them are non-strict distance 
metrics, although they are often called “distance”. In practice, 
Jousselme’s (strict) distance of evidence [13] and Tessem’s 
(non strict) betting commitment distance [23] (also called the 
pignistic probability distance) are most frequently used ones. 
A fuzzy set based distance of evidence was also proposed in 
our previous work [24]. Jousselme et al. provided an excellent 
survey [22] on available works on the distance of evidence, 
where many definitions are introduced and compared. 

Various types of distance of evidence have been proposed 
under the geometric interpretation [25] of the DST, where 
a basic belief assignment (BBA) is considered as a vec- 
tor of a Cartesian-alike space and each focal element is 
deemed as a base of the space [22]. However, all existing 
distances of evidence have their own limitations. First, a 
strict distance metric should satisfy the requirements including 
the non-negativity, non-degeneracy, symmetry, and triangular 
inequality. None of the existing distances of evidence except 
for Jousselme’s distance can satisfy all the requirements, 
i.e., they are not strict distance metrics. This is due to the 
switch between theoretical frameworks. For example, Tessem’s 
betting commitment distance [23] first transforms BBAs into 
pignistic probabilities, and fuzzy set based distance of ev- 
idence [24] first transforms BBAs into fuzzy membership 
functions. Such switches between different frameworks lead 
to the loss of information, thus the distance between BBAs 
cannot be described precisely using these measures. Therefore, 
their strictness cannot be assured and they may encounter 
counter-intuitive results when measuring the distance between 
different BBAs. Although Jousselme’s distance is a strict 
metric and performs well in many cases, it still has some 
unsatisfactory behaviors based on our experiments, e.g., the 
lack of discriminibility in some cases and the maximum value 
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problem as pointed out and analyzed in this paper. Due to the 
limitations of existing distance measures, we are motivated 
to propose better strict distance measures of evidence. We 
propose to use belief intervals [1] [Bel(A), Pl(A)] of each 
focal element A to describe the closeness between BBAs, 
where Bel(A) and PI(A) are respectively the belief and 
plausibility of a focal element A computed from the given 
BBA defined on a known frame of discernment. If we consider 
that a BBA is used to model the uncertainty as a whole for all 
focal elements, then the belief interval of each focal element 
in a BBA represents the uncertainty of the corresponding 
proposition. If we use all belief intervals of a BBA as a whole 
as its “feature” vector, then the distance between the “feature” 
vectors of different BBAs describes the difference between 
them. Since a belief interval is an interval number, the distance 
between the same focal element’s two belief intervals in two 
BBAs can be calculated by Wasserstein’s distance of interval 
numbers [26]. Based on all the distance values between belief 
intervals, we design a Euclidean-family distance using the 
sum of squares of all belief intervals’ distance values, and 
a Chebyshev-family distance using the maximum of all belief 
intervals’ distance values, respectively, to measure the distance 
between two different “feature” vectors of belief intervals, 
and thus to measure the distance between two BBAs. Our 
new definitions directly use the belief intervals defined in 
the DST, i.e., there is no switch between different theoretical 
frameworks. It can be proved that our new proposed measures 
of distance of evidence are strict distance metrics satisfying 
the requirements of non-negativity, non-degeneracy, symmetry 
and triangle inequality. This paper extends our preliminary 
results in [27], where the basic idea of the belief interval based 
distance is briefly introduced and a few illustrative examples 
are provided. In this paper, the limitations of existing distances 
are summarized more specifically, and the causes of these lim- 
itations are analyzed. More detailed formulations, proofs, and 
theoretical analyses of the new proposed distance measures are 
provided. More examples, simulations, and related analyses 
are provided for comparison between our proposed distances 
and the existing ones. An application of the proposed distances 
of evidence in the BBA approximation and an application of 
multiple criteria decision making (MCDM) using the proposed 
distance of evidence is also provided. These are all added 
values (contributions) of this paper. 

The rest of this paper is organized as follows. Basics of the 
theory of belief functions are briefly introduced in Section 
II. The geometric interpretation and some commonly used 
distance measures of evidence are reviewed in Section II. 
Limitations of existing measures are explained based on illus- 
trative examples in Section III. In Section IV, two new distance 
metrics in DST are proposed based on the belief intervals 
and the distance between interval numbers. The proof of our 
proposed distance metrics’ strictness, and the comparisons 
between our measures and distance bounds are also provided 
in Section IV. In Section V, examples, simulations, applications 
and related analyses are provided based on the comparison 
between new metrics and some existing ones from different 


aspects to show the rationality and efficiency of our new 
metrics. Section VI concludes this paper. 


II. BASICS OF THEORY OF BELIEF FUNCTIONS 


The theory of belief functions was first proposed by Demp- 
ster and then further developed by Shafer, therefore, it is 
usually called Dempster-Shafer evidence theory (DST) [1]. 
It has become an important theory and tool for uncertainty 
modeling and reasoning. 

The basic concept of the theory of belief functions is the 
frame of discernment (FOD), which represents the discourse 
domain of the problem we are interested in. Under the closed- 
world assumption, the FOD: © = {61,...,4,} is defined as a 
set of n mutually exclusive and exhaustive elements. If a set 
function m : 2° — [0,1], where 2° is the powerset of 0}, 
satisfies 


dace A) = 1, m(0) =0, (1) 


and if m(A) > 0 holds, then m is called a basic belief 
assignment (BBA, or mass function) over the FOD 9. All 
the sets A € 2°satisfying m(A) > 0 are called the focal 
elements. Each focal element represents a proposition in the 
FOD. Given a BBA, a body of evidence (BOE) [1] can be 
determined, which is defined as the set of focal elements and 
their corresponding mass assignments. 

A belief function over the FOD 0, denoted by Bel : 2° — 
(0, 1], is defined as: 


Bel(A) = pee m(B),VAC®@. (2) 


A plausibility function over the FOD ©, denoted by Pl : 
2° — [0,1], is defined as: 


PI(A) = eae m(B),VAC®. (3) 


The plausibility function and the belief function satisfy [1]: 


PI(A) = 1— Bel(A), (4) 
where A is the complementary proposition of A € 2°. The 
plausibility Pi(A) and the belief Bel(A) constitute a belief 
interval [Bel(A), PI(A)]. The length of the belief interval 
[Bel(A), Pl(A)] represents the degree of imprecision for the 
proposition or focal element A. The non-null mass value 
assigned to © represents the degree of ignorance, i.e., the 
“unknown” state. Furthermore, in DST, different uncertainty 
measures have been proposed such as Non-specificity [28], 
Ambiguity Measure (AM) [29], Aggregated Uncertainty (AU) 
[30] and distance-based uncertainty measures [31]. 

The evidence combination rules are for uncertainty reason- 
ing, e.g., Dempster’s rule of combination is used to combine 
different distinct bodies of evidence (BOEs). Suppose that 
there are two independent BBAs: mj, and mz. The conflict 
coefficient [1] is defined as 


Ky ap ag M(Ai)m2(B)). (5) 


'The powerset is the set of all subsets of © including the empty set @. 


642 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


If k < 1, then the combined BBA m can be obtained using 
Dempster’s rule of combination: 


0, A=, 
m A;)m j 
m(A)= 4 aan AMM) 6) 
1-9 mi (Ai)m2(B;)? # O, 


A,NB;=0 


where Aj,..., Az and B,,...,B; are focal elements of m, and 
mz, respectively. Note that Dempster’s rule of combination is 
both commutative and associative, i.e., symmetric. 

The obtained BBA is in fact the orthogonal sum of the 
original BBAs. Dempster’s rule of combination has been criti- 
cized for its counter-intuitive behaviors [9], [32], especially in 
high conflict cases. Accordingly, many alternative combination 
rules have emerged. See [7], [33] for details. 


III. TRADITIONAL MEASURES OF DISTANCE OF EVIDENCE 


How to measure the closeness between two BBAs? This 
is crucial for performance evaluation, algorithm optimization 
and other belief functions based applications. The answer is 
the distance of evidence. The conflict coefficient K (defined 
in Eq. (5)) in Dempster’s rule of combination was the only 
means to quantify the interaction between BBAs for about 
two decades (from 1967 to 1990). However, this coefficient Kk” 
(denoted by dc in the sequel) may be inappropriate to quantify 
the closeness between two BBAs as the conflict between two 
identical BBAs might not equal to 0. 

Example 1. Suppose that the FOD is O = {0),..., On 
BBAs defined on © are mi({61}) =--- = mi({6,}) 
and mo({61}) = --- = me2({On}) = 1/n. 

Obviously, they are two identical BBAs and dc = 1—1/n. 
When n becomes large, dc approximates to its upper bound 
(i.e., 1). If one considered dg as a distance, such a result would 
be somewhat counter-intuitive. 

A strict distance metric defined on the set € d(-,-): ExE > 
R, (x,y) 4 d(a, y) should satisfy 

1) Non-negativity: d(x, y) > 0; 

2) Non-degeneracy: d(z,y) =O @Sa2=~y; 

3) Symmetry: d(x, y) = d(y, x); 

4) Triangle inequality: d(x, y) + d(y, z) > d(a, z),Vz € E. 

Obviously, dc violates the Non-degeneracy condition. It is 
not difficult to verify that dc only satisfies the Non-negativity 
and Symmetry conditions. Therefore, it is not a strict distance 
metric. 


\. Two 
=1/n 


Many other definitions” of distance of evidence were pro- 
posed in the past two decades as reported in Jousselme’s sur- 
vey [22]. Most of them can be considered as being established 
under the framework of the geometrical interpretation of the 
DST. 


To be rigorous, only those definitions satisfying the four requirements can 
be called “distance”. The ones that do not satisfy these four requirements 
can only be called “dissimilarity” or “closeness” measures. In the sequel, for 
the convenience, all dissimilarity definitions are called “distance” when no 
ambiguity should occur. 


A. Geometric interpretation of the theory of belief functions 


The geometrical interpretation of the DST [25] is as follows. 

Suppose that the FOD is © with |O| = n. Let Ee be the 2” 
-dimensional Cartesian space* spanned by the set of column 
vectors {e4, A C O}. Each vector v of Eg could be rewritten 
asv = yo ace a,-e,. Here a4 € R can be considered as 
the coordinate of v along the direction of e 4. 

A BBA m is a vector of & 6, which should satisfy 
Dace @A = 1,a9 = 0, with a4 > 0 and a4 = m(A) due 
to the properties of unity and non-negativity for mass values, 
as illustrated in Eq. (1). 

For example, suppose that the FOD 0 = {61,02}. ABBAm 
on O is m({61}) = 0.3, m({O2}) = 0.2, m({O1, O2}) = 0.5. 
Under the closed-world assumption, m is illustrated in Fig. 1. 


A 
©0=(6,,0)} 


eq = m({6,,0,}) ~~ 
=0.5 


Ao, =m({6,}) 


Boy = m({,}) 


ea} = 0.2 


Figure 1. Geometrical interpretation of a BBA. 


According to the geometrical interpretation of DST, two BBAs 
m, and my are two vectors. That is, m; and my are two 
“points” in the evidential Cartesian space. In the past thirty 
years, people use all kinds of distance for Cartesian space 
like Euclidean distances, Chevbyshev distances, Minkowski 
distances, Manhattan distances, etc, to define the distance be- 
tween BOEs according to the geometrical interpretation [22]. 
Note that many available definitions are non-strict distance 
metrics [22]. A few typical measures are reviewed in details 
in the following. Many other definitions can be found in 
Jousselme’s survey [22]. 


B. Selected existing distance measures of evidence 


The earliest distance of evidence is the Tessem’s distance of 
betting commitment [23], which is proposed for the evaluation 
of BBA approximations. 

1) Tessem’s betting commitment distance: The pignistic 
probability corresponding to a BBA m is defined by [34]: 


ANB 


m(B), (7) 


3Note that whether the geometric interpretation of the DST satisfies 
the strict requirements or properties of the geometric space needs further 
justifications. Here we call it as the evidential Cartesian space. 
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which is a probabilistic transformation [35] from a BBA 
for the probabilistic decision making in DST. The betting 
commitment distance (or Tessem’s distance) dr is computed 
by [23]: 


dr(m,,mz) = max {|BetP(A) — BetP2(A)|}. (8) 


It can be reformulated according to the evidential Cartesian 
space as 


|BetP, (A1) = Bet P2(A1)| 


|BetP,(A2) — Bet P2(A2)| 
cdy(m,,m2) = max ' 


|BetP, (Aan) os Bet P2(Agn)| 
/ / 
= max {|BetPi -e4 — BetP2 -ea|} : 

(9) 
where BetP; = [BetP;(A1), BetP;(Ag), seey Bet P;(Aan Ne 
7192: 

dr is a Chebyshev L, alike distance. From the definition of 
dy, we can see that there is a switch from the DST framework 
to the probability framework when calculating this distance. 
The inconsistency between different theoretical frameworks 
leads to the loss of information and some unexpected results, 
therefore, it is not recommended. 

Actually, due to the switch between different frameworks, 
Tessem’s distance is not a strict distance metric [36]. It violates 
the non-degeneracy condition as shown in Example 2. 

Example 2. Suppose that FOD is 0 = {61,..,0,}. There 
are two BBAs my and mz defined on © including m({@1}) = 
+++ = m4 ({On}) = 1/n and m2(O) = 1. Their corresponding 
pignistic probabilities are both P(6;) = --- = P(0,) = 1/n. 
Therefore, dp (mj ,m2) = 0, although they are different BBAs. 
Thus, dr does not satisfy the non-degeneracy condition. dr 
also does not satisfy the triangle inequality and has other 
drawbacks. See details in [36]. 

2) Fuzzy membership function (FMF) based dissimilarity: 
First transform BBAs m,(-) and m2(-) into FMFs*: y) and 
p?) as for i = 1,2 


pO = [ p41), np (82), HO (On) | 


1 
= [ PI (6,), PIO (69), ++ PIM (6) | ; (0) 


According to the dissimilarity definition between FMFs, dp is 
defined as [24]: 


(11) 


In (11), the operator A represents the conjunction (min) and 
V represents the disjunction (max). 


4The FMF quantifies the membership grade of the element to the fuzzy 
set. It is a generalization of the characteristic function in classical set and can 
take its values in the interval [0, 1]. 


It can be reformulated according to the evidential Cartesian 
space as 


dp(m,,m) = 


yoy, min ((Int - mj)’ - e9,, (Int - m2)! - e9,) 
yoy, max ((Int-m 1)! - e9,, (Int - mz)! - e9,)” 


where Int is the intersection matrix, whose element is 
Int(A, B) =1,if ANB 4 @; Int(A, B) = 1, if ANB=29. 
One has PI = Int-m, where PI is the corresponding plausibility 
vector of m. 

dr in fact indirectly represents the distance between two 
BBAs using the distance between their corresponding FMFs. 
Note that dr is not a strict distance metric. First, dp does 
not satisfy the non-degeneracy condition due to the switch 
from the DST framework to the fuzzy set framework. Given 
two different BBAs, their corresponding fuzzy membership 
functions (FMFs) (singleton plausibility) might be the same 
as shown in Example 3. 

Example 3. Suppose that FOD is 0 = {61,602,093}. Two 
BBAs my, and my defined on © are 


m1({61, 63}) = 0.3, m1({61, 62}) = 0.7. 
mo({61}) _ 0.3, meo({61, 62}) = 0.4, m2(O) = 0.3. 
Their corresponding singleton plausibilities are the same: 
wD (01) = Pl ({01}) = 1.0, w (02) = Pla ({O2}) = 0.7, 
pu) (01) = Pla({01}) = 1.0, uw (02) = Plo({O2}) = 0.7, 


ras (12) 


Therefore, dz (m,,m2) = 0, although m, and mg are different 
BBAs. 

3) Jousselme’s distance: By borrowing the Lz Euclidean 
distance with weighting matrix in Cartesian space, Jousselme’s 
distance [13] is defined as: 


A 


dj(m,,m2) 0.5 + (m, — my)" Jac (m,—mz), (13) 


where the elements Jac(A, B) of Jaccard’s weighting matrix 
Jac are defined as 

|AN B| 
|AUB| 


It has been proved to be a strict distance metric in [37] and 
has become the most commonly used one so far; however, it 
might cause some unsatisfactory results as shown in Example 
4. 

Example 4. Suppose that the FOD is O = {64,..., 46}. 
Three groups of BBAs are as follows. 


m1 ({91}) = 1; 


Jac(A, B) = 


(14) 


me({64, Os, O6}) = 1. 


Using Jousselme’s distance, one gets dj j(mi,m2) = 
dj(m3,m4) = dj(ms5,mg) = 1, that is, they all reach the 
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maximum value |. The six BBAs here are all categorical 
BBAs>. m, and mz each has a unique singleton focal element. 
The opinions of m, and mz are totally different, and both of 
them are specific, i.e., with no ambiguity. The opinions of m3 
and my, are totally different, however, both of them are not 
specific and with ambiguity. The BBAs ms and meg carry larger 
ambiguity. Intuitively, it makes sense that the distance between 
my, and mz should be larger than the distance between m3 and 
m4; also, the distance between m3 and my, should be larger 
than the distance between ms and mg. Jousselme’s distance 
does not provide this expected behavior. 

Furthermore, Jousselme’s distance is relatively insensitive 
to the change of BBA in some cases as shown in Example 5. 

Example 5. Suppose that FOD is 0 = {61,...,43}. Con- 
sider the following three BBAs m1, mz and m3: 


mi ({A1}) = mi({Ao}) = m1 ({63}) = 1/3; 
me2({A1}) = mo({O2}) = m2({3}) = 0.1, m2(O) = 0.7; 
ms3({61}) = m3({62}) = 0.1, m2({63}) = 0.8. 


Since both m, and mz have no preference on any singleton 
{6;} and m3 commits more belief to {63}, it is intuitively 
expected that the distance between m, and mz should be 
smaller than that between m, and m3. However, Jousselme’s 
distance leads to dj(m,,m2) = dj(m,,m3) = 0.4041, which 
shows that dj; does not discriminate them well. 

In summary, many existing distance measures of evidence 
have evident limitations, even for the strict Jousselme’s dis- 
tance metric. For Tessem’s distance and FMF-based dis- 
tance, there exist the switches between different theoretical 
frameworks. With Tessem’s distance, there is a switch from 
the framework of DST to the framework of the probability 
theory; with FMF-based distance, there is a switch from the 
framework of DST to the framework of the fuzzy sets theory. 
These switches bring the undesired loss of information, which 
should be avoided. Jousselme’s distance borrows the distance 
metric from the traditional Cartesian space to the evidential 
Cartesian space. The strictness of the evidential Cartesian 
space, i.e., the geometrical interpretation of DST needs further 
verification. Therefore, it is not uncommon to obtain some 
unsatisfactory results when using Jousselme’s distance. 

Since traditional distances in DST have limitations (or 
unsatisfactory behaviors), we propose new strict distance mea- 
sures of evidence with better behaviors. 


IV. DISTANCE OF EVIDENCE USING BELIEF INTERVALS 


As aforementioned, the limitations and non-strictness of 
some existing distances of evidence are caused by the switches 
between theoretical frameworks, therefore in our design of 
the new distances, no such switch is allowed. In Jousselme’s 
distance, there is no switch between different theoretical 
frameworks, where only the focal elements and the corre- 
sponding mass values are used. Given a BBA, the mass value 
for a proposition (or focal element) A represents the basic 
belief assigned to A. Besides the mass value m(A), other 


5 categorical BBA is a BBA only has one focal element. 


values, like Bel(A) and PI(A), are optional. Furthermore, 
the belief interval [Bel(A), Pl(A)] can be used to represent 
the degree of imprecision of A. Therefore, the belief interval 
[Bel(A), Pl(A)] carries more information of a given proposi- 
tion A than the mass value m(A), which is a scalar. Therefore, 
we propose to use the belief interval (with more information) 
to replace the mass value for achieving better performance. 
In DST, besides the BBA (m2), the belief function (Bel) and 
plausibility function (Pl), there also exist the doubt function 
(Dou) and the commonality function (Q) [1]. Given one 
function, it can be transformed to any other one of these 
five functions according to their definitions and the Mébius 
transformations [1]. That is, any one of the five functions has 
one-to-one correspondence to the other, therefore, one can also 
try to jointly use other functions like the commonality and 
doubt for designing new distance measures. In this paper, we 
choose the belief interval [Bel(A), Pl(A)], VA C 9, since the 
belief and plausibility are more familiar to people and more 
widely used in practice than the doubt and the commonality. 
Furthermore, [Bel(A), Pl(A)] has intuitive physical meaning, 


i.e., the degree of imprecision for the proposition A. 

Suppose that two BBAs mj, and mz are defined on 
© = {01,6,...,0,}. For each focal element A; C O 
(@ = 1,...,2” — 1), we can calculate the belief interval 
of A; for m, and mg, respectively, which are denoted by 
(Bel, (A;), Pl, (A;)] and [Belg(A;), Plo(Ai)]. That is, each 
BBA m; (j = 1,2) can also equivalently be modeled by a 
matrix with the size of (2” — 1) x 2: 


[Bel;(A1), P1;(A1)] 


[Belj(Aan-1), Plj(Aan—1)| 
A belief interval can be regarded as a classical interval 


number® included in [0,1]. Then the above matrix can be 
regarded as a vector of interval numbers (belief intervals): 


[Bel;(A1), P1;(A1)] 
Fej;= |: 
[Belj(Agn-1), Plj(Azn-1)] 
BIL,(A1) oe 
- BLj(Agn—1) 


Here Fe; can be considered as a generalized feature vector 
describing the BBA m,. If we can define the distance between 
Fe, and Fez, then the distance between m, and mz is readily 
obtained. Here Fe; and Fez are two generalized vectors whose 
elements are intervals’. 

We can borrow the definition of the distance metric for the 
vectors in Cartesian space to define the distance of evidence 
here: 1) define the distance between two feature vectors in each 
dimension; 2) combine the distance value for each dimension 
into a scalar. 


®An interval number [a, b] with a < b is actually an interval with the lower 
bound a and the upper bound 6, where a,b € R. When a = 8, an interval 
number degenerates to a real number. 

7The Fe; can be also considered in the evidential Cartesian-alike space, 
however, the coordinate of each direction e, is a generalized real number, 
i.e., an interval number. 
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Therefore, in the step 1, the distance in each dimension 
should be defined, i.e., we must define the distance between 
two interval numbers. Irpino et al [26] proposed a Wasserstein 
distance for interval numbers as briefly introduced below. 

Suppose that F’ and G are the corresponding distribution 
functions of the random variables f and g, respectively, 
Wasserstein [2 metric is defined as [26] 


dwass(F, G) 4 ia (F-l(t) = G-1(t))*dt. (16) 


For a uniform distribution of points, an interval of reals x; = 
[a;,b;] can be expressed as a function of [26]: 


xi(t) _ [a;, bj] =a, +t(bj — aj), VO<t< 1s (17) 


If one further considers a description of the interval using 

its midpoint (a; + b;) /2 and radius (b; — a;) /2, x; can be 

rewritten as 

ay + b; b; — Aaj 
2 2 

Then, Euclidean distance between homologous points of two 

intervals 71 = [a1, bi] and x2 = [az, bg] is defined as [26] 


a(t) = (2t-1),vO<t<1 (18) 


dpi ([a1, bi], [a2, ba]) = dwass(1, £2) 
Jo (ar (t) — w2(t)]at 


ay+bi aotbe 


2 
2 2 
(Began — 522) (2t ~ 1) jw 


eng — saga] + 3 [ge — bpeay’ 


(19) 


Note that there are also other types of distance between 
interval numbers [26]. We choose the Wasserstein distance 
in Eq. (19) to calculate the distance between belief intervals, 
because it is a strict distance metric, which is very crucial for 
defining distance measures of evidence. Furthermore, it has a 
simple form, and is easy to compute. 

According to Eq. (19), the distance between two feature vec- 
tors Fe, and Fe, in terms of each dimension 7 (¢ = 1,..., 2” — 
1), ie., the distance between two belief intervals BI,(A;) : 
[Bel (A;), Ply (A;)] and BI2(A;) : [Belg(A;), Pla(A;)] can 
be obtained. dg; (BI,(A;), BIn(A;)) can be regarded as the 
distance between m, and m2 when considering the focal 
element A; only. 

Therefore, we can obtain in total 2” — 1 belief interval 
distance values for all A; C O. 

In step 2, we combine all the 2” — 1 distance values into 
one scalar, i.e., to get the total distance between Fe, and Feo. 

In Cartesian space, if we try to measure the distance be- 
tween two points, we also calculate the dissimilarity between 
each dimension of the two points, and then use some way 
to combine the dissimilarity values of different dimensions 
to a scalar, i.e., the distance value. Euclidean family and 
Chebyshev family are two commonly used ways to generate 
such a scalar in the Cartesian space. We can borrow this idea 
to generate a scalar from the above mentioned 2” — 1 focal 
elements’ corresponding dissimilarity values. Therefore, two 


commonly used distance definitions — the Euclidean family 
and the Chebyshev family — are used to combine the distance 
values of all dimensions into a scalar, i.e., the distance value. 
Two new distances of evidence are presented next. 


A. Euclidean-family Belief Interval-based Distance d¥ 


Given two BBAs my, and m2, our proposed Euclidean- 
family belief interval-based distance is a combination of each 
focal element’s belief interval distance value. To be specific, it 
is a normalized root squared summation of the distance value 
between belief intervals in each dimension (focal element) as 
shown in Eq. (20) 


dB (m,m2) © 4/Ne- > [der(Bh (Ai), Bh(A:)))°. 


(20) 
Here N, denotes the normalization factor to make d%, € [0, 1]. 
Eq. (20) can be re-written as 


dE .(m1,m2) © \/ No-dpr -T?"-) . dB, 
=) Nidarsd 


"=D is an identity matrix with rank 2” — 1, and 


dpr(Bh(A1), Bl2(A1)) 


(21) 


where 1° 


dpr= : 
dpi(Bly(Aan_1), Blo(A2n-1)) 


The normalization factor for Euclidean-family Belief 
Interval-based Distance d, is N. = 1/271. 

Suppose that the FOD is {0),62,...,0,}. m1 and mp are 
two BOEs. mi({0;}) = 1,1 € {1,...,n} is a categorical 
BBA, which represents the most certain case, i.e., there is no 
uncertainty when assign the belief to the singleton proposition 
{6,}. The two BBAs: 


m,({6;}) = 1, m2({8;}) = Lvwit j, “Lge€ {1,...,n}. 

(22) 
are two different and the most certain cases. They have no 
common part, i.e., they fully support different singletons, 
therefore, the dissimilarity (distance) between them reaches 
the maximum. 

Assume that A is a focal element. 

When |A| = 1, only two belief intervals have distance 
value dp; of 1 (ie. der(Blh(0;), Blo(0;)) = 1 and 
dg1(Bl(6;), Blo(0;)) = 1 ). The other values are 0. 

When |A| > 1, dg; = 1 for those focal elements including 
6; or 0; (but not including both 0; and @;) are 1. dgy = 0 for 
the rest. 

To be specific, 

when |A| = 2, dpr = 1 only for 2 x C}_, focal elements’. 
dgr = 0 for the rest; 

when |A| = 3, dgr = 1 only for 2 x C?_, focal elements. 
dgr = 0 for the rest; 


Choose one element 6; out of the 0’ = O—{6;, 0; }(|O’| = n—2). Then, 
together with 0; and 0;, respectively, to constitute focal element {6;,,6;} and 
{9;,,9;}, respectively. So, the number of focal elements with dg; values of 
lis 2x Ohare Similarly, we can obtain the values in other cases for A > 1. 
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when |A| = n— 1, dg; =1 only for 2 x Ce focal ele- 
ments. dp; = O for the rest; 

when |A| = n, the dg; value of unique focal element, i.e., 
total set (O) is 0. 

So, the summation S, of all (dg)? is 


So =2x1+2xCi_.+2xC?_,+ 
+2x C™ 7 +0 
SD De One kt 


2 
Crate ote: 7) 


So, the normalization factor N. = 1/S, = 1/2"~1. 


B. Chebyshev-family Belief Interval-based Distance dey 


Given two BBAs m, and m2, our proposed Chebyshev- 
family belief interval-based distance is the maximum of all 
belief interval distance values. 


dy (my,mz) = _ ide (Bh (Ai), Bl(Ai))} - 
(24) 
Actually, we use the distance of belief intervals for focal 
elements instead of their mass assignments to define the 
distances of evidence when compared with traditional defi- 
nitions. Euclidean-family belief interval-based distance d§, 
and Chebyshev-family belief interval-based distance d& 7 are 
strict distance metrics. de 7 and dg ; are defined over belief 
intervals. Given a BBA (m(A;),i = 1,...,2" — 1), we can 
generate a set of belief intervals ({Bel(A;), Pl(A;)]). On the 
other hand, given a set of belief intervals ([Bel(A;), Pl(Ai;)]), 
according to the Mobius transformation, we can generate a 
unique BBA (m(A;),¢ = 1,...,2" — 1) from PI(A;),i = 
1,...,2” — 1 or Bel(A;),i = 1,...,2" — 1. As we know [1], 
there is a one-to-one mapping between a set of belief intervals 
([Bel(A;), Pl(A;)]) and a BBA (m(A;),2 = 1,...,2" — 1). 

According to Eqs. (20)-(21) and (24), it is easy to verify 
that de 7 and dG ; Satisfy non-negativity, non-degeneracy and 
symmetry conditions. We need to prove the property of 
triangle inequality of d&,. 

Suppose that there are three BBAs m,,m2,m3 defined on 
the same FOD with size of n. Because dg; defined in Eq. (19) 
is a strict distance metric, so, for each A; (2 = 1,..., 
2” — 1) there exists 


dB (11 (Ag), m2(A;)) + de, (™m2(A;), m3(Ai)) 
> di,(m1(Aj),m3(Ai)). 
Suppose that 
de (m1 (Az), m2(Ag)) = aa; dB ,(me(Aiz),m3(As)) = das 
de (m1 (As), m3(Aj)) = G. 
One has 
a, + bi > & 
me + b; es 
> a; Po 2a, > ¢ 


> Da} +E B+2¥ ab > Do ce. 


i= 


8,5 = 


(25) 


BR 


According to Cauchy-Schwarz inequality, 


(26) 
So, 
dap + DY bE +24) 90 a? D7 BF 
i=l i=l i=l i=l 
2 Gg Beto iby = DG (27) 
$=) 4) i=1 i=1 
= Dag + D7) be +24) 0) a? D0 PS Dg 
vai k ve i=l i=1 i=l 
Therefore 


i=l i=l i=1 i=l 
2 
2 2 
Sl ea hh 
i=l i=l 


2 
= (d5;(m1,mz2) + dg;(m2,ms)) ; : 
= (d§,(m1,mz) + d§,(m2,m3)) > (dz ,(m1,ms3)) 
=> dp ,(m1,m2) + dp ,(m2,m3) > dj, (m1,ms). 
(28) 
So, the triangle inequality for az 7 1s Satisfied. 
For d&,, we have 


ar ae: ai + max bis 


peeegS VHA, 


djr(my,m3) = 0 


ee (8) 
There exists 


aptby < ae, ay max a = d&,(m1,mz)+d&,(m2,ms3), 


yen G=1u, 

(30) 

e., dG (m1,m2) + dG ,(mz,m3) > d&,(m1,m3). Conse- 
quently, dG, satisfies triangle inequality. 

In summary, d%, and d&, are strict distance metrics. 

In the traditional geometric interpretation of DST introduced 
in section III, the coordinates of different bases are represented 
by mass values (real numbers), while for our new proposed 
distances, the coordinates are represented by belief intervals 
(interval numbers). Therefore, our new distances are under a 
generalized geometric interpretation of evidence theory. 


C. An illustrative example 


Example 6. Suppose that the FOD is O = {0}, 02, 63}. Two 
BBAs mj, my over the FOD are: 


m1 ({61}) = 0.1, m1({@2}) —= 0.1, m1({@3}) = 0.05, 
my({61, @2}) => 0.1, mi({61, 63}) => 0.05, 
m1 ({2, 3}) = 0.1, m1(O) = 0.5. 


ma({61}) = 0.2, m2({62}) = 0.3, m2({63}) = 0.1, 
m2({O1, 62}) = 0.05, m2({A1, 43}) =. 0.1, 
m2({o, 03}) = 0.05, m2(9) = 0.2. 
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First, the belief intervals are calculated for each focal 
element of m and map, respectively: 


BI ({61}) : [0.10, 0.75], 
BI; ({62}) : (0.10, 0.80], 
BI, ({63}) : [0.05, 0.70], 
BI, ({61, 62}) : [0.30, 0.95], 
BI, ({61, 3}) : (0.20, 0.90], 
BI; ({62, 63}) : (0.25, 0.90], 
BI,(@) : [1.00, 1.00]. 


BI2({61}) : [0.20, 0.55], 
BIx({62}) : (0.30, 0.60], 
BIx({63}) : (0.10, 0.45], 
BIn({1, 02}) : 
BI2({61, 63}) : (0.40, 0.70], 
BIx({62, 63}) : (0.45, 0.80], 
BIx(O) : [1.00, 1.00]. 


(0.55, 0.90], 


Second, use Eq. (19) to compute the distance between belief 
intervals of each corresponding focal element in m, and mg: 


dpr= 
der (Bl ({61}), Bl2({01})) 0.1000 
dgr (Bh ({02}), Bl2({62})) 0.1155 
dpz (BI; ({03}), Blo({03})) 0.1323 
dor (BIn(401,03)), Bla10:,B21)) | = |’ 6.1893 
dpz (BI; ({01, 03+), Blo({61,63})) 0.1155 
dar (Bl; ({02,03}), Bl2({02,03})) 0.1000 
dp (BIi(@), Blo(®)) 0.0000 

Then, according to Eq. (20), d&,(m1,mz2) is computed by 

dy (m , M2) 

—  [o3—1 y. ( 9-1000? + 0.1155? + 0.132374 

~ 0.1323? + 0.1155? + 0.1000? + 0? 


= 0.1429. 
According to Eq. (24), dG,(m1,mz) is computed by 


i2,( = 0.1000, 0.1155, 0.1323, 
BI\M1,™M2) —™MAaX) 9 1393 0.1155, 0.1000, 0 
= 0.1323. 


D. On distance bounds 


Here, the distance bounds are analyzed. In Antonucci’s work 
[38], a lower bound and an upper bound of a distance of 
evidence were proposed based on the distance of consistent 
probabilities. For a BBA m, its consistent set of probability 
mass functions (PMF) is 

\ » GI) 


~{p| Loco P() =1 
Ku = {P FS hc1 P(6) > Bel(A), VA € 2° 


where P is a consistent PMF. Given two PMFs P, and Po, 
their Manhattan distance is 
Al 


p) ees |Pi(@) — Po(4)|- 


Given two BBAs my, and mp, the lower bound 6 and upper 
bound 6 are defined as 


6(P1, P2) = (32) 


} = 6(P,, P. 
6(m,,m2) Pekin Pa ( 1; 2); 


_ 5(P1, P2). 
0(m1,m2) Pi€Ku, Ps€Kn, ( 1, 2) 


(33) 


We calculate d%,, dG, and the strict distance measure dy 
together with the upper and lower bounds to check whether 
these measures are beyond the bounds or not. We set |O| = 3 
and randomly generate 1000 BBA pairs according to the BBA 
generation algorithm [39] in Table I. 


Table I 
ALGORITHM 1: RANDOM BBA GENERATION - UNIFORM SAMPLING FROM 
ALL FOCAL ELEMENTS. 


Input: O: Frame of discernment; 

Nmazx: Maximum number of focal elements 

Output: Output: m: BBA 

Generate P(©), which is the power set of O; 

FOReach 1 < i < |P(O)| do 

Generate a value according to the Gamma distribution G(1, 1) > mj, 
END 

Normalize the vector m 


m(A;) =m’; 


= [m1,...,mp(ey|] + ms 


The results are shown in Figs. 2 and 3 (zoom in around 
lower bound). Results are sorted by increasing values of dj. 
It is experimentally shown that d; and our proposed de ; and 
dG ; are not beyond the lower and upper bounds as shown in 
Figs. 2 and 3 in this simulation. 


1 
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Figure 2. Comparisons between bounds, dz, dz 7 and dg r 
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Figure 3. Comparisons between bounds, dz, dk y and dg , (Zoom in around 
the lower bound). 


In the next section, experiments and simulations are pro- 
vided to show the rationalities of our proposed distance 
measures of evidence based on the comparisons with available 
measures. 
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V. EXPERIMENTS, SIMULATIONS AND APPLICATIONS 


To verify the rationality of the proposed distances, numeri- 
cal examples, simulations and the applications related to BBA 
approximations and MCDM are provided. 

In each example below, d;, dr, dr, do, de. and on are 
compared’. 


A. Example 7 


Suppose that the FOD is O = {0}, 02,03}. my has relatively 
large mass value for {62} as shown in Table II. Therefore, 
intuitively, for m;, 7 = 2,...,7 listed in Table II, if the mass 
assignment for {62} is relative large, the distance between m, 
and m, intuitively should be relatively small. For ms and me, 
the mass of focal elements containing {02} (i-e., {01, 02} and 
{62, 03}) is 0.8. It makes more sense if the distance value with 
respect to ms and mg decreases. 


Table II 
BBAm, 


Focal element Mass assignment 


{01 } 0.1 
{02} 0.8 
{03} 0.1 
{6;, 00} 0 
{02,03} 0 
{6.03} 0 
{91, 02, 03} 0 


Table II 
BBAS mj,1 = 2,...,7 


Focal el.\ BBAS mg m3 m4 m5 mg m7 
O41 0.8 0 0 0 0 0 
62 0 0.8 0 0 0 0 
03 0 0 0.8 0 0 0 
01 U@2 0 0 0 0.8 0 0 
02 U 63 0 0 0 0 0.8 0 
6, U 63 0 0 0 0 0 0.8 
01 U @2 U 63 0.2 #02 O02 02 O02 0.2 


Calculate the distance between m, and m;, 7 = 2,..., 7 using 
different distance definitions as illustrated in Fig. 4. All the 
distance measures perform similarly in all seven cases and 
agree with the expected behavior as we can see in Fig. 4. 


The following Examples 8 - 12 drawn from [13] are used 
for comparing our proposed measures and available ones. 
B. Example 8 


Suppose that three BBAs mj, m2, and mgs are defined on 
the FOD 0 = {0),...,9n} as follows: 


mi ({61}) = mi({62}) =--- = mi ({In}) = 1/n; 
m2(O) = 1; 
m3({6,}) =1, for some k € {1,...,n}. 


°dc corresponds to the conflict coefficient K defined in Eq. (4). 


Distance values 
° 


Figure 4. Distance between mj, and m;,i = 2,...,7. 


The change of the distance values with the increase of the size 
n of FOD are illustrated in Fig. 5. 

dr provides undesired result, i.e., with the increase of n, 
there always exists dp(m,,m2) = 0. dco cannot discriminate 
m, and mg, and also mz and m3. 

In this example, m, is a Bayesian BBA, which only has 
singleton focal elements; mz is a vacuous BBA, which only 
has the total set © as the unique focal element; m3 is a 
categorical BBA with one singleton focal element, which is 
absolutely confident in {0}. 

m , represents the case with only discord and with zero 
non-specificity; m3 represents the crispest case; m2 represents 
the most ambiguous case. So, the distance between mz and 
m3 represents the dissimilarity between the most ambiguous 
case and the crispest case; the distance between m, and m3 
represents the dissimilarity between the case with zero non- 
specificity and the crispest case; the distance between m, and 
my represents the dissimilarity between the case with zero 
non-specificity and the most ambiguous case. 


08 08 
d, 06 d, 06 —e—d(m,,m,) 
0.4 0.4 —4—d(m,,m,) 
0.2 02 
0 0 
0 2 4 6 g 10 0 
1 1 
08 08 
0.6 d_.06 
d, 
0.4 0.4 
0.2 02 
0 0 
0 2 4 6 8 10 0 2 4 6 8 «10 
1 1 
08 08 
E 06 C06 
d d 
Blogg Blog 
02 02 
0 0 
0 2 4 6 8 10 0 2 4 6 8 10 
Figure 5. Dissimilarities between m1, m2 and m3 for Example 8. 
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Therefore, intuitively, the distance between mz and m3 should 
be the largest one. As we can see in Fig. 5, dB ,(m2,m3) and 
d j(mz,m3) provide satisfactory results, ice., 


dz;(m2,m3) = max dx,(mi,mj). 


i,g€{1,2,3} 
tFj 
From the decision standpoint, m, has no inclination to any 
choice 6;; m2 also has no inclination to any choice 6;; m3 
has a clear inclination to the choice 6;. Therefore, intuitively, 
the dissimilarity between m, and mz should be smaller than 
that between m, and m3. As shown in Fig. 5, our proposed 
dz,, dG,, and dr provide satisfactory results according to this 
standpoint, i.e., de ,(m,,m2) < d= ,(m,ms3), d& -(m1,m2) < 
d&,(mi,m3) and dr(m,,m2) < dr(m1,ms3). 
As we observed, dj cannot discriminate this since 
dj(m,,m2) = d j(m,,m3) with the increase of n. This is 
because one has 


1 1 
dy(my,mz) = dy(my,ms) = \/5(1-—). 


according to Jousselme’s distance defined in Eq. (13). 
Based on the analyses above, d%, provides rational behav- 
iors in this example. 


C. Example 9 (Example 5 Revisited) 


The values of the different distances between m, and mp, 
and between m, and mg are given in Table IV. 


Table IV 
EXAMPLE 9: RESULTS BASED ON DIFFERENT DISTANCES OF EVIDENCE. 


Distance dz dr dp dg dz de 
d(m,,mz) 0.4041 0 0.5833 0.2000 =—-0.2858 ~—(0..2333 
d(m,,m3) 0.4041 0.4667 0.6364 ~—-0.6667 ~—-0.4041_~—«0.4667 


As aforementioned, both m, and m2 have no preference 
on any singleton {0;} and m3 commits more belief to {03}, 
therefore, it is intuitively expected that the distance between 
mj, and mz should be smaller than that between m, and m3. 

Using Jousselme’s distance, one obtains dj(m1,m2) = 
dz(m,,m3) = 0.4041 which is unsatisfactory for such a case. 
Table IV shows that when using dr, dc, dr, dz, and don one 
obtains d(m ,m2) < d(m,,m3), which is more reasonable. 
However, Tessem’s distance leads to dp(m,,m2) = 0, and it 
is counter-intuitive. 


D. Example 10 


Suppose that the FOD is 0 = {61,...,0i10$. A BBA m; 
defined on O is 


m:(O) = 0.1, mz({62, 93, 63}) = 0.05, mz({07})) = 0.05, 
mz(At) = 0.8. 
where A; is a varying focal element from {6,} to 0. One 
singleton {0;} is added at each step. All the A;, Vt = 1,...,10 


are as shown in Table V. The second BBA m* has only one 
focal element, and it is defined as 


m* ({1, 92, 03, 64, 95}) = 1. 


Table V 
EXAMPLE 10: DISTANCE VALUE CHANGES WITH At. 


Step Az 
1 {Oi} 
{41,42} 


{61, 02,03, 04, 5 } 
{01, 02,03, 04, 05, O06} 


10 {01,02,03, 04, 05,06, ---; 910} 


We use different distance measures including dz, dr, dc, 
dr, d%, and d&, to calculate the distance between m* and 
m;. Their behaviors are shown in Fig. 6. 

Intuitively, when A; starts from the focal element {6} 
to the focal element {0),62,63,04,95}, the distance be- 
tween m, and m* should become smaller. When A; = 
{61, 02, 63, 04,95}, the distance should reach the minimum 
value. Then, when the size of A; becomes larger and departs 
from {61,92,03,04,05}, the distance value should become 
larger. As shown in Fig. 6, dj, dr, dp and our proposed d&, 
provide expected behaviors. 


Distance Values 


2 3 4 2: 6 7 8 9 10 
Size of A, 


Figure 6. Distance between m; and m* for Example 10. 


Since the conflict between m,; and m* are fixed, i.e., 
dco(m;,,m*) = mrz({67}) : m*({61, 82, 93, 84, O5}) = 0.05, 
the value of do is fixed to 0.05. Therefore, dco is not a 
proper distance. As shown in Fig. 6, our proposed ae; 
performs well, however d& ; does not provide a satisfactory 
behavior. Although d&, reaches its minimum value when 
At = {61,62,03,04,05}, it cannot detect the change of A; 
when the size of A; is smaller than 5 or the size of A; is 
larger than 5. 


E. Example 11 


Suppose that the FOD is 0 = {6),02,63, 64,65, 06}. In 
each case of this example, we set a fixed BBA mag, re- 
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spectively, where m2(B) = 1. B can be considered as a 
“desired” focal element. Another BBA my is also set, where 
m (A) = 1/63, VA C ©. Let m; approximate to m2 in some 
way. To implement this, at each step, we increase m,(B)’s 
value of A = 0.02 and the mass value of other focal elements 
(A 4 B,VA C 0) is decreased of A/62. 

We also let m, go away from m2. To implement this, at each 
step, m1(C),C #4 B,VC C © has an increase of A = 0.02 
and the mass value of other focal elements (A 4 C,VA C 0) 
has a decrease of A/62. Therefore, C’ can be considered as 
an “undesired” focal element. 

We use different distances between m, and mg at each step. 
Their behaviors with varying m, are analyzed. 

1) Case A: Here B = {63}, ie., the desired focal element 
B is a singleton. With the change of m(B), m, is gradually 
close to my. Therefore, if a distance measure becomes smaller 
with the change of m(B), then it behaves as intuitively 
expected. 


Distance Values 


1 
0 5 10 15 20 25 30 35 40 45 50 
Step 


Figure 7. Distance between mj, and m2 for Example 11-Case A. 


The changes of the different distance measures in the above 
procedure are shown in Fig. 7. All the distance measures used 
here provide expected behaviors. 

2) Case B: Here |B| > 1, eg, B = {61,02} or 
B = {61,602,603}. That is, the desired focal element B is a 
compound focal element. With the change of m1(B), mj is 
gradually close to mg. 


Given different |B], the changes of different distances in the 
above procedure are shown in Fig. 8, where all distances used 
here provide expected behaviors when B is a compound focal 
element. 
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Figure 8. Distance between m1 and m2 for Example 11-Case B. 


3) Case C: Here B = {64} and C = {65}, where the 
undesired focal element C' is a singleton. 

With the change of m1(C), m is gradually away from 
m2. If a distance measure becomes larger with the change 
of m,(C), then it behaves reasonably (ie., as intuitively 
expected). 

The changes of the different distance measures in the above 
procedure are shown in Fig. 9, where all the distance measures 
tested here provide expected behaviors. 


Distance Values 


0 5 10 1 2 2 30 35 40 45 50 
Step 


Figure 9. Distance between m1 and mz for Example 11- Case C. 


4) Case D: Here B = {0g} and the undesired focal element 
C = O. With the change of m1(C), m, is gradually away from 
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my. If a distance measure becomes larger with the change of 


m,(C), then it behaves as intuitively expected. Fig. 10 shows 
the changes of the different distance measures in the above 
procedure. 


Distance Values 


0 5 10 15 20 25 30 35 40 45 "50 
Step 


Figure 10. Distance between m1 and m2 for Example 11-CaseD. 


As seen in Fig. 10, only d&, and dj; behave as expected. 
dr never changes in the whole procedure, because the corre- 
sponding pignistic probability never change with the increase 
of m1(®). dc diminishes significantly with the increase of 
m (0), because as the mass assignment is increasing for the 
total set, the conflict between m, and mz becomes smaller. 
Therefore, dc is only a conflict degree and must not be used 
as a proper distance measure. 


F. Example 12 


Suppose that the FOD is 0 = {64,...,010$. A BBA m 
defined on O is 


m4(®) = 0.1, mz({92, 03, 04}) = 0.05, mi({07}) = 0.05, 
mz(B:) = 0.8. 


where B; is a varying focal element from {6,} to 0. One 
singleton 6; is added at each step (step 1-10). By, Vt = 1,..., 10 
equals to A; as shown in Table V in Example 10. 

From the step 11 - 19, By is pruned from its first element 
until attaining the singleton {619} at step 19. All the B, at 
different steps are shown in Table VI. The second BBA m* is 
m*({O10}) = 1. We test different distance measures including 
dy, dr, dc, dr, d§, and d&, to calculate the distance between 
m* and m;. Their behaviors are shown in Fig. 11. 

From the Step | to 9, B; does not include {419}. At the step 
10, By = {01,...,010}, which first includes {09}. After the 
Step 10, all distance values diminish to reach their minimum 
values when B; = {619}. This is what we expect intuitively. 


Table VI 
EXAMPLE 10: DISTANCE VALUE CHANGES WITH At. 


Step Bt 

1-10 At 

11 {02, 03, 94, 95, 96, 97, 98, 09, P10} 
12 {03, 04,45, 96, 7, 98, 99, P10} 

13. {604,95, 96, 97,48, 99, O10} 

14 {45, 06,97, 98,99, P10} 

15 {86, 97, 88, 9, 10} 


19 Tigi 


At the first stage (Step 1 - Step 9), dc does not change 
when B, changes, because the conflict between m,; and m* 
never changes before the step 10, where 


dco(m;,,m*) => (mz({61, 62, A3} 
+ mz({07}) + me(Be)) -m*({010}) = 0.9. 


Distance Values 
Co 
wn 
T 


0 2 4 6 8 10 12 14 16 18 20 
Step 


Figure 11. Distance between m; and m* for Example 12. 


At the second stage (Step 10 - Step 19), dc does not change 
with the change of B;. Although with the emergence of {60}, 
dc diminishes, however, its value is fixed up to the final 
step, because the degree of conflict never changes after the 
decreasing at the Step 10, where 


de (m;,m™) = (mz({01, 42, 43} + me({O7})) -m*({A10}) 
= 0.1. 


Therefore, dc must not be used as a proper distance 
measure. It is just a degree of conflict between two BBAs. 

At the first stage, dp provides unsatisfactory behavior. It 
slightly increases with the change of B;, that is, it is insensitive 
to the change of B; in the first stage. At the second stage, dp 
provides an expected behavior, i.e., it decreases and reaches 
its minimum value at the final Step 19. 

a ; iS insensitive to the change of B; in both the first and 
the second stages. Its value never changes in the first stage 
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and after a decreasing at the step 10, it remains unchanged in 
the second stage. 

The major difference between the behaviors of dy and d5, 
is in the first stage, where dy increases while d%, decreases. 
We think that the decrease makes more sense in fact, and 
the reason is as follows. In the first stage, the size of B, 
becomes larger, and thus, the degree of uncertainty, i.e., the 
ambiguity of m, increases. For two focal elements {6,} and 
{61, 02}, although they both do not include {619}, the distance 
from {619} to a more ambiguous case, i.e., {01, 02} intuitively 
should be smaller than the distance from {419} to a more 
specific case. We can make an analogy here. {019} is our 
desired result, while {0;} and {01,02} are two undesired 
results. A more ambiguous undesired result should be more 
preferred than a clear undesired result, i.e., the distance from 
the desired result to the more ambiguous undesired result 
should be intuitively smaller. 

With the increase of | B,|, such a distance should intuitively 
further decrease. Therefore, ae ; provides the correct expected 
behavior in this example. 


G. Example 13 


Suppose that the FOD is 0 = {61, 62, ...,42n}. Two BBAs 
defined on © are 


my, : m1({O1}) = mi({92}) =... = mi({@n}) = 1/n; 
m2: m2({On41}) = mMa({On+2}) =. = Mo({Pan}) = 1/n; 


In this example, we set n from 1 to 7, i.e., the size of 
FOD is from 2 to 14. We use dz, dr, do, dr, ae, and dG; 
to calculate the distance between m, and my given different 
values of n. The distance values are shown in Fig. 12. As 
we can see in Fig. 12, all the distance measures except for 
our proposed d&, remain unchanged with the increase of n. 
Our proposed d§, decreases with the increase of n, which 
intuitively makes sense. The reason is as follows. With the 
increase of n from k—1 to k, the cardinality of the FOD, i.e., 
|O| = 2(& — 1) also increases to 2k. Then, the number of all 
possible “focal” elements!° increases from 2?(*—)) to 2*, 


Distance Values 


1 2 3 4 5 6 vi 
Values of n=|0| 


Figure 12. Distance between m, and mz for Example 13. 


‘0Here “focal” elements refer to all the subsets of the FOD ©. They could 
have non-zero mass values or zero mass values. 


Note that for each BBA, there are only n focal elements with 
non-zero mass assignment. When n = k — 1, for each BBA, 
there are 2(k — 1) focal elements in total with non-zero value; 
when n = 2k, for each BBA, there are 2(k —1) focal elements 
in total with non-zero value. So, the number of focal elements 
with non-zero mass assignment increases from 2(k — 1) to 
2k, i.e., only two more focal elements with non-zero mass 
assignment are added. 

On the other hand, when n = k — 1, for each BBA, there 
are 2?(*~1) _ (k—1)—1 focal elements in total with zero mass 
assignment; when n = k, for each BBA, there are Q2k _f—] 
focal elements in total with zero mass assignment. That is, 
with the increase of n from k—1 to k, there are 22* —k —1— 
(224-1) _ (k—1)—1) =3 x 4*-1 +1 more focal elements 
with zero mass values. 

The common part (“focal” elements with zero mass assign- 
ment) between mj, and mg is significantly enlarged. At the 
same time, their different parts (those focal elements with 
non-zero values) only slightly increases of 2. Therefore, their 
distance should decrease. So, our proposed ae ; also behaves 
as expected in this case. 


H. Brief summary 


According to above examples, our proposed de , behaves 
as expected in all the cases, in contrary to other measures 
compared. dj; also behaves well in many cases, however, in 
some special cases, it provides counter-intuitive behaviors. Our 
proposed dG ; behaves as expected in many cases, however, 
it is insensitive to the change of BBA due to the L,, norm 
used in its definition. Other measures like dco, dr, dr are 
not strict distance metrics. They generate counter-intuitive 
behaviors in some cases, although they can be used to describe 
the dissimilarity between BOEs in particular cases. 

Note that the results of the above examples can only show 
that our proposed distance measures behave as expected in 
those cases in the examples. Whether the rationality of our 
proposed measures has more generalized meaning needs fur- 
ther theoretical analysis besides the testing based on examples. 

In the following part, simulation results based on random 
experiments are presented. 


I. Simulation 


In this section, different measures are compared based on 
random simulations. 

Relationship analyses are helpful for the joint use of mul- 
tiple distance measures. Almost all the available distance 
measures have their own pros and cons. If one does not 
trust any single distance measure, one can use two distance 
measures together to construct a 2-D measure to describe the 
dissimilarity between two BOEs, e.g., Liu’s 2-D measure [40]. 
Then how to describe such a complementarity between mem- 
bers in a 2-D measure? As referred in Jousselme’s survey [22], 
a low correlation (close to 0) between two measures means 
that they quantify two distinct (and possibly complementary) 
aspects of the distance between two belief functions, while a 
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high correlation means that they are redundant. Hence, weakly 
correlated pairs of distances could be good candidates for 2-D 
measures. 

The relationships between different measures are described 
using scatter plots and the correlation coefficient. The basic 
procedure of the simulations is as follows. 

Let D denote the set of distance measures used here, which 
includes d,, dr, do, dr, ae ; and ag y- Here, we calculate the 
correlation between different distance measures as follows. 

1) Set the size of FOD to |©| and generate VN, BBAs: m* 
(s =1,...,N;) according to Algorithm 1| [39] in Table I. 

2) Generate a reference BBA m” according to Algorithm 1. 

3) Pick up a distance pair dz, and d,, where d,,d, € D and 
calculate (d,(m”,m*),dy(m",m*)) for all s =1,..., Ns. 

4) Draw the scatter plot for (d,(m",m*), dy(m",m*)) (s = 
1,...,.Ns) to show the correlation between d, and dy. 

5) Compute the correlation coefficient [22] for d, and dy: 


Ns = 
(ds — de) (a ~ 4) 
CR(d,,dy) =§ ——— =, (34) 
¥ (a -d)"\ S (a5 - 4) 


where d;, denotes d,,(m",m*), dj denotes d,(m",m*), di 
denotes the mean of d,s = 1,...,Ns, and d, denotes the 
mean of dj,s = 1,...,. Ns. For each pair d; and dy in D, we 
calculate their correlation coefficient, to obtain a correlation 
matrix CR. 

In simulations, we generate five types of BBAs: 


e Complete BBA: A BBA with 2!°!—1 focal elements with 
non-zero mass assignment. 
Fixed length BBA: A BBA with a fixed number of focal 
elements. 
Simple support BBA: m(A) = a,m(@) = 1 — a, where 
Ac @ and aé (0, 1]. 
e Dichotomous BBA: m(A) = a,m(A) = b,m(@) = 1— 
a —b, where A C 0, A is the complementary set of 
A€é0O,a,b€ [0,1] anda+b<1. 
e Consonant support BBA: A BBA with nested focal ele- 
ments, e.g., {01}, {01,02}, {61, A2, O3}. 
One can just make minor modifications to Algorithm 1 to 
randomly generate the above types of BBA. 


Case A: Here we set |O| = 8. Randomly generate 4000 
complete BBAs m*, s = 1,...,4000. The reference BBA 
(complete) m” is also randomly generated. According to the 
above simulation steps, we can obtain the scatter plots between 
each pair of distance measures in D = {dy, dr, do, dr, 
d%,, d%,} as shown in Fig. 13, where their corresponding 
correlation coefficients are also provided for convenience. 


As we can see in Fig. 13, our proposed ae and dy have 
high correlation with Jousselme’s distance dy, which is a 
strict distance metric and performs well in many cases as 
demonstrated in the previous subsection. 
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Figure 13. Scatter plots for |O| = 8 using complete BBAs. 


dc always has low correlation with other measures, since 
it is actually a degree of conflict, which is different to the 
distance. 

If a 2-D or 3-D measure to jointly use multiple distance 
measures is desired, we can refer to the scatter plots and 
corresponding CR values in Fig. 13. As aforementioned, the 
members in the 2-D measure should better have low correla- 
tion (close to 0), thus, they could be possibly complementary. 
As shown in Fig. 13, our proposed az ; and ae , have relatively 
low correlation with dc and dp, therefore, dc and dr are more 
proper to be selected to construct 2-D measures. The focus of 
this paper is not the 2-D measures. We mention 2-D measure 
just to show our motivation of the correlation analysis between 
different 1-D measures. If one is interested in the construction 
and applications of 2-D measures, one can refer to Liu’s work 
[40], where dr and dc are used jointly as a 2-D measure. 

As shown in the previous subsection, d§, and dj are two 
very appealing measures when compared with others, and they 
seem highly correlated to each other. Therefore, in the sequel, 
we will discuss the relationship between d and our proposed 
d#, in details. 


Case B: Although in Case A, the high correlation between 
ae , and dz has already been verified, with different FOD 
size |O|, the correlation degree can be different. Here we 
use different FOD size |O| to check whether the correlation 
between d%, and d, is greatly affected by |©| or not, and to 
obtain the influence trend with the change of |O]. 

In this case, we set the size of the FOD to |O| = 
2,3,4,5,6,7,8, respectively. First, randomly generate 4000 
complete BBAs, 4000 simple support BBAs, 4000 dichoto- 
mous BBAs and 4000 consonant support BBAs. Their corre- 
sponding reference BBAs (complete, simple support, dichoto- 
mous, consonant support) m”’s are also randomly generated. 
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Following the above steps and under different sizes of FOD, 
we can obtain the scatter plots between each pair of distance 
measures in D = {dj,d%,} for the 4000 complete BBAs, 
4000 simple support BBAs, 4000 dichotomous BBAs and 4000 
consonant support BBAs, respectively, as shown in Fig. 14. 


Simple support BBA 
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Complete BBA 


Consonant support BBA“ 
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Figure 14. Scatter plots for |O| = 2,...,8 using different types of BBAs. 


With the increase of |O|, the evolution of the correlation 
coefficient between dj; and ae , for four different types of 
BBAs including complete, simple support, dichotomous, and 
consonant support are shown in Fig. 15. 

As seen in Figs. 14 and 15, the increase of |©| leads to the 
decrease of the correlation coefficient for all types of BBAs. 
No matter using which types of BBA, dy and d, are highly 
correlated, although the correlation coefficient decreases with 
the increase of |O|. As aforementioned, this to some extent 
shows the rationalities of our proposed new measure d%,. 
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—>— Simple Support BBA 
0.2 - 8 = Dichotomous BBA 
01 © Consonant Support BBA 
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Figure 15. Evolution of the correlation coefficient between dj and de , using 
different types of BBAs. 


J. Application of distance in BBA approximation evaluation 


Here we provide an application of different distance mea- 
sures of evidence in BBA approximations. The BBA ap- 
proximation [23], [41], [42] aims to obtain a simpler BBA 
by removing some focal elements and thus to reduce the 
computational cost in the evidence combination and other 
operations in DST [1], [43]. A good BBA approximation 
should have little loss of information when simplifying the 
BBA. If the BBA obtained using an approximation is closer 
to the original BBA, such an approximation has less loss of 
information and thus, is more desired. Therefore, we can use 
the distance of evidence to evaluate BBA approximations. 

Here three types of BBA approximations are compared 
including k — 1 — x [23], D1 [41] and summarization (Sum) 
[42]. Using k — 1 — 2, the approximated BBA is obtained by 

1) keeping no less than k focal elements; 

2) keeping no more than / focal elements; 

3) deleting the masses which are no larger than z. 


Sum method [42] also keeps focal elements with the largest 
mass values as in k — 1 — x. The masses of removed focal 
elements are accumulated and assigned to their union set. 

D1 method [41] is to keep some focal elements with the 
largest mass values in the original BBA and to re-assign the 
mass assignments of the other focal elements to those kept 
focal elements according to a well-designed criterion. See 
more details in related references [23], [41], [42]. 

k —l— a has a coarse way of re-normalization, and Sum 
method re-assigns the masses of removed focal elements to 
their union set. D1 has a more subtle way to re-assign the mass, 
therefore, D1 should be a better method. Here we provide a 
simulation with distance of evidence as the evaluation criterion 
to check if the evaluation results agree with the analysis. 

In our simulation, |O| = 4. A complete BBA m (i.e., with 
2+ — 1 = 15 non-empty focal elements) can be randomly 
generated according to the Algorithm | in Table I. We use the 
distance of evidence (dz, dr, dr, de ; and ag 7» respectively) 
between the approximated BBA m and the original one m in 
average as the performance evaluation criterion. 

Our comparative analyses have 1000 Monte Carlo runs (i.e., 
totally 1000 complete BBAs are randomly generated). The 
number of remaining focal elements r for the approaches used 
here are set to from 14 down to 2 (decrease by 1). Then, 
different approximation results in each run can be obtained 
using the different approximations given a number r. The 
average (over 1000 runs) distance values between the original 
BBA m and the approximated BBA m obtained using different 
approaches given different remaining focal elements number 
are shown in Fig. 16 (a)-(e). 


Here the parameter in k—/—«a is setask =1=randax=0.5. 
As shown in Fig. 16 (a)-(e), using different distances, the 
distance values are different; however, the changing trends are 
the same, i.e., with the decrease of the number remaining focal 
elements, the distance value increases. This represents more 
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Figure 16. Evaluation of BBA approximations using different distances. 


loss of information. Based on all the distances of evidence used 
here except for dr, the BBA obtained by D1 is usually closer 
to the original BBA. It is experimentally shown that when 
using the distances of evidence including our new proposed 
ones, D1 is a better BBA approximation when compared with 
others. This is accordant to the analyses above, therefore, 
our proposed distances of evidence can be well used in 
performance evaluation in belief function related applications. 


K. Application of ae; in multiple criteria decision making 


Here we provide a multiple criteria decision making 
(MCDM) application of our developed measure d%,, which 
usually performs well in the previous examples and simula- 
tions. 

Let’s consider a selection problem in the car purchase. Four 
cars {A,, Ao, A3, Aa} are considered: 

e A, = TOYOTA YARIS 69 VVT-i Tendance; 

e Ao = SUZUKI SWIFT MY15 1.2 VVT So’ City; 

e A3 = VOLKSWAGEN POLO 1.0 60 Confortline; 

e A, = OPEL CORSA 1.4 Turbo 100 ch Start/Stop Ed.; 
Following criteria are for selecting the best car to purchase: 

e C; is the price (in €); 

e Cy is fuel consumption (in L/km); 

e C3 is the CO2 emission (in g/km); 

e C4 is the fuel tank volume (in L); 

e Cs is the trunk volume (in L); 


From information extracted from car-makers technical char- 
acteristics available on the Internet!!, we can build the score 


matrix S = [S;,;] for the above four cars as 
Ch Co C3 Ca Cs 
Aif15000 43 99 42 7387 
— A2| 15290 5.0 116 42 892 
~ Ag] 15350 5.0 114 45 952 
Aa {15490 5.3 123 45 1120 


' http:/Awww.choisir-sa- voiture.com 


When we use criteria C,, Co and C3, smaller is better. For 

criteria Cy and Cs, larger is better. We multiply values of 
columns C, C2 and C3 by -1 to generate a modified score 
matrix S’ in order that the MCDM problem here is with 
homogeneous preference order (“larger is better’) for each 
column: 


Ci C2 C3 Ca C5 


At 15000 4.3 99 42 737 

s/ = Ag 15290 5.0 116 42 892 
~ Ag 15350 5.0 114 45 952 
Aa 15490 5.3 123 45 1120 


For simplicity, the importance imp(C;) of each criteria C; 
takes a value in {1,2,3,4,5}, where | means the least impor- 
tant, and 5 means the most important. Here, imp(C1) = 5, 
imp(C2) = 4, imp(C3) = 4, imp(C4) = 1 and imp(Cs) = 3 
are adopted, which means that the price (criteria C;) is the 
most important one and the volume of fuel tank (criteria C’,) is 
the least important one. According to these importance values 
and after the normalization, we obtain the following vector of 
relative weights of criteriaaw=[2 4 4 ~ Zi. 

We use the BF-TOPSIS (Belief Function based Technique 
for Order Preference by Similarity to Ideal Solution) approach 
[44] with our de 3, to solve the MCDM problem above. 

First, from the score matrix S’, generate BBAs!* m;,;(A;) 
mij (Ai), and mi; (A; UAj;) according to the BBA generation 
approach proposed in [44] as: 


m1 (A ) = 0.9859, mi,1(A2 U As U Aa) = 0.0047, 

My (0 =0. 0094; m2, 1(A2) = = 1.0, 

m2,1(Ai U Az U Ane = 0,ma21(O) = 0; 

m3,1(A3) = 0.0022, ms, nee U A2U Anes = 0.9932, 
m3,1(Q) = 0.0046; ma, Ane = 1.0, 

Ma, 1(A U Ag U As) = 0, ma,1(Q) = 0; 

m1,2(A1) = 1.0, m1,2(A2 U A3 U Aa) = 0, ™m1,2(O) = 0; 
m2,2(A2) = 0.1250, m2,2(A1 U A3 U Aa) = 0.4375, 
m2,2(9) = 0.4375; 

m3,2(A3) = 0.1250, m3,2(A1 U Ag U Aa) = 0.4375, 
m3,2(Q) = 0.4375; 

ma,2(Aa) = 1.0, ma,2(A1 U Ae U A3) = 0, ma,2(Q) = 0; 
m1,3(A1) = 1.0, m1,3(A2 U Ag U Az) = 0,m1,3(0) = 0; 
m2,3(A2) = 0.1250, m2,3(A1 U A3 U Aa) = 0.4375, 
m2,3(Q) = 0.4375; 

m3,3(A3) = 0.1964, m3,3(A1 U Ag U Aa) = 0.3750, 
m3,3(Q) = 0.4286; 

ma,3(Aa) = = 1.0,ma 3(Ai U Az U A) = 0,ma 3(9) = 0; 
my,4(A1) = 0, my,4(A2 U A3 U Aa) = 1,m1,4(9) _ 0; 
m2,4(A2) = 0, m2,4(Ai U A3 U Aa) =, m2,4(Q) = a 
m3,4(A3) = 1.0, m3,4(A1 U Az U Az) = 0,m3,4(0) = 
ma,4(Aa) = 1.0, ma4,4(A1 U Az U As) = 0,1m4,4(0) = 
mi,5(Ai) = =0,m1 5(A2 U A3 U Aa) = =1,m155 (9) = = 0; 
m2,5(A2) = = 0.1990, ma,5 (Ai U A3 U Ap = 0.3825, 
m2,5(Q) = 0.4185; 

m3,5(A3) = = 0.3530, m3,5 (Ai U Ae U As) = = 0.2231, 
m3,5(Q) = 0.4239; 

Ma 5(Aa) = = 1.0, mas(A1 U Ag U A3) = = 0, ma,5(Q) = 0; 


Second, for each alternative A;, compute the d% 5, (My,7,m beet) 
between m,,; and the best ideal BBA defined by mest A; 5 ‘A 


124 = 1,...,4 denotes the index of the alternative; 7 = 1,...,5 denotes the 


index of the criterion. 
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1, and the distances d,(m;,;,m ms") between m;,; and the 
worst ideal BBA defined by m*°s'(A,) 7 


1. Then, two 
distance matrices!’ are obtained: 
0 0 0 0.8660 0.8660 
pst — 0.6151 0.7032 0.7071 0.8660 0.6419 
BI ~ 1! 0.7100 0.7032 0.6430 0 0.5102 |’ 
0.8660 0.8660 0.8660 0 0 
and 
0.8660 0.8660 0.8660 0 0 
pwerst — 0.2804 0.2033 0.19388 0 0.2552 
BI ~~ ) 0.1885 0.2033 0.2555 0.8660 0.3819 
0 0 0 0.8660 0.8660 
Here, the element D#7'(i,j) = d,(mij,my") and 
Der (4,9) = df ,(mi,5,mso™). 
Third, compute the weighted average of d%,(mi, jmp) 


values with relative importance weighting factor w,; of 
criteria C';. Similarly, compute the weighted average of 
worst) values. More specifically, compute 


dp (mij, mM; 5 


dA 2yu: dEit mj,;,m me, (35) 
5 
qvrt(A,) A S- wj- ey (m;, pm), (36) 
j=l 


The relative closeness of the alternative A; with respect to 
ideal best solution A is then defined by 


qworst ( Ai) 
qworst (A;) + dbest( A; ) 


Since d**(A;) > 0 and d¥*(A;) > 0, then Cl(A;, A) € 
[0, 1]. If db*s'(A;) = 0, it means that the alternative A; coin- 
cides with the ideal best solution and thus Cl(A;, A‘) = 
(the relative closeness of A; with respect to A>®' is max- 
imal). Contrariwise, if d*°“'(A;) = 0, it means that the 
alternative A; coincides with the ideal worst solution and thus 
Cl(A;, APs) = 0 (the relative closeness of A; with respect to 
Abest ig minimal). 

In the final, the set of alternatives is ranked according to 
the descending order of Cl(A;, A®*') € [0,1], where a larger 
Cl(A;, A‘) value means a better alternative (or a higher 
preference). 

Based on the score matrix S’ and importance of criteria, A, 
tends to be the best car to buy, since the three most important 
criteria clearly take their best values for car A,. When using 
the classical TOPSIS [45] method with the Euclidean distance, 
we obtain the preference order Ay > A; > Az > Ag, where 
A, is the best choice and A» is the worst one. When we use 
the BF-TOPSIS method based on our proposed d& By» we obtain 
a more satisfactory preference order A; > A3 > Ag > Ag. 

As shown in this application example, our proposed distance 
measure can be well used in the multiple criterion decision 
making. d%, has also been used successfully in other kind 
of applications related to risk management and for protecting 
housing areas against torrential floods in France [46], [47]. 


CUA, AM) & (37) 


'3Qne can also try to use other distance measures for belief functions as 
referred above. Here we only use dz , for illustration. 


VI. CONCLUSIONS 


Two novel distance measures of evidence have been pro- 
posed based on the distance measures between belief intervals. 
According to the comparisons between our proposed measures 
and the existing ones based on examples and simulations, it 
is shown that our proposed distances well describe the degree 
of closeness between different BOEs. Our results demonstrate 
that Euclidean distance based on belief intervals works better 
than the Chebyshev distance based on belief intervals. 

Besides their good behaviors, the main interest of our pro- 
posed distances of evidence is that they have been established 
directly in the belief functions framework, contrary to most of 
other distance measures that switch from belief functions to 
probabilistic or fuzzy set framework, which leads to loss of 
information and bad behaviors in general. 

Note that in this paper, many justifications or verifications 
of our new proposed distance measures are based on numer- 
ical examples and simulations. Numerical examples in belief 
functions related fields are usually designed according to the 
subjective intuitions, which lack objective criteria and the 
standard testing data. Furthermore, the results and conclusions 
only based on examples are usually incomplete. Therefore, 
more thorough justifications including theoretical analysis and 
more examples in special cases are needed to further examine 
our new measures. However, the theoretical evaluation or 
justification in belief functions related fields is still premature. 

Therefore, our future work will focus on the theoretical and 
the objective evaluation and analysis of the belief functions re- 
lated fields. We will try to establish the standard testing BBAs 
for the distance measures in the theory of belief functions. 
Our proposed distance measures will also be tested based on 
more experiments and simulations to find the possible counter- 
intuitive examples and analyze the reasons for the possible 
counter-intuitive behaviors. Our new distance measures will 
be applied to more belief functions related applications, e.g., 
the performance evaluations, for the further verification. 

Furthermore, all the distance measures including ours are 
under the closed-world assumption. That is, when the mass 
assignment for the emptyset is positive, they cannot be used to 
measure the closeness between BOEs. Therefore, generalizing 
our new distance metrics to the open-world assumption is one 
of our future research directions. 
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Abstract—In this paper the notion of (probabilistic) inde- 
pendence of two events defined classically in the theory of 
probability is extended in the theory of belief functions as 
the credibilistic independence of two propositions. This new 
notion of independence which is compatible with the probabilistic 
independence as soon as the belief function is Bayesian, is defined 
from Fagin-Halpern belief conditioning formulas drawn from 
Total Belief Theorem (TBT) when working in the framework of 
belief functions to model epistemic uncertainties. We give some 
illustrative examples of this notion at the end of the paper. 
Keywords: credibilistic independence, belief functions, belief 


conditioning, total belief theorem. 


I. INTRODUCTION 


In this paper the notion of (probabilistic) independence of 
two events defined classically in the theory of probability [1] 
is extended in the theory of belief functions [2]. We call it 
the credibilistic independence of two propositions to make a 
clear distinction between the origin of uncertainty related to 
events (i.e. the random or stochastic uncertainty), and in a 
more general context the origin of uncertainty of propositions 
(i.e. the epistemic uncertainty due to lack of knowledge). The 
epithet credibilistic refers to a credal system chosen for the 
codification of belief. In this work our credal system is the 
mathematical framework of belief functions. 

Several works have been proposed in the past to define 
different notions of independences in imprecise probability 
framework and in the theory of belief functions. For exam- 
ple, Couso et al. [3] did propose several notions of inde- 
pendences illustrated by different combined Ellsberg’s urns 
experiments. In 2000’s Ben Yaghlane, Smets and Mellouli 
[4], [5] did explore the notion of independence and they 
define the doxastic independence. Their proposal is however 
essentially based on Dempster’s rule of combination which 
is known problematic and incompatible with imprecise condi- 
tional probabilistic calculus as shown in [6]—[8]. More recently 
Jirousek and Vejnarova in [9] did propose a definition of 
conditional independence which is based on some complicate 
factorization principles of the joint basic belief assignment 
(BBA) into separate marginal spaces of the variables. All 
aforementioned works share two same basic principles for 
attempting to define the notion(s) of independence: 1) work on 
joint (Cartesian) product space, and 2) work with BBAs. These 
two fundamental principles yield to quite complicate defini- 
tions of independence(s) difficult to use by most engineers or 
researchers for their own applications or developments. 
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In this research work we adopt a radically different stand- 
point. We work with a BBA defined with respect to a single 
frame of discernment! (FoD), and we work directly with 
belief intervals induced by Fagin-Halpern conditioning rule 
[6], [7], rather than some factorization principles of joint 
BBA or extension principles of marginal BBAs. Our approach 
is constructive, easier than previous attempts to define inde- 
pendence, and consistent with the notion of probabilistic (or 
stochastic) independence of two events defined in the theory 
of probability. Our notion of credibilistic independence can be 
used easily to check if two propositions are credibilistically 
independent, or not, given a BBA. This new approach could 
be helpful for practitioners of belief functions. We do not 
have yet made more investigations for showing its usefulness 
for applications, but we expect it will generate some interest 
because this problem has been already explored by several 
researchers in the past based on different standpoints. 

This paper is organized as follows. After a brief recall of 
basics of probability theory and belief functions in Sections 
II and II, we characterize mathematically the notion of 
credibilistic independence of two propositions in Section IV. 
Some basic illustrative examples are shown in Section V, with 
conclusions in Section VI. 


II. BASICS OF PROBABILITY THEORY 


In probability theory [1], the elements 6; of the space O 
are experimental outcomes. The subsets of © are called events 
and the event {0;} consisting of the single element 6; is an 
elementary event. The space © is called the sure event and 
the empty set @) is the impossible event. We assign to each 
event A a number P(A) in [0,1], called the probability of A, 
which satisfies the three Kolmogorov’s axioms: 1) P(@) = 0; 
2) P(©) = 1; and 3) if AN B = {@}, then P(AU B) = 
P(A) + P(B). The fundamental Total Probability Theorem 
(TPT), also called the law of total probability, see [1] states 
that for any event B and any partition {A;, Ao,..., Ax} of 
the space O, the following equality holds 


P(B) = P(BN Ai) + P(BN Ag) +...+ P(BN Ax). CD) 


‘It can be any Cartesian product space in fact. The main point is that the 
(joint) BBA we work with is defined with respect to this space. 
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Starting from TPT formula (1) and assuming P(B) > 0, 
we get for any i € {1,...,k} (after dividing each side of (1) 
by P(B) and rearranging terms) the equality 

P(AiNB) | P(A; NB) _ P(Aim B) 

pay py ey 


This equality allows us to define the conditional probability 
P(Aj|B) by? 


P(A;|B) = P(A; B)/P(B). (3) 


One can verify that the conditional probability (3) satisfies 
the three axioms of the Theory of Probability [1]. 


Similarly, by considering an event A; of © and the partition 
{B, B} of ©, the formula P(A;) = P(A;N B) + P(A; N B) 
applies, and by dividing it by P(A;) (assuming P(A;) > 0), 
one gets 

P(A; N B) _,_ PUN 8B) (4) 
P(A) P(Ai) ~ 


This allows to define the reverse conditional probability 
P(B\A;) by 


P(B\A;) = P(A; B)/P(A;). (5) 


Probabilistic Independence: Two events A; and B are said 
to be probabilistically independent (or P-independence for 
short) if and only if P(A;|B) = P(A;) and P(B\A;) = 
P(B). From conditioning formulas (3) and (5) and because 
conditional probabilities formulas P(A;|B) and P(B|A;) are 
mathematically defined only if P(B) > 0 and P(A;) > 0, one 
determines the condition of P-independence (which is well 
defined even if P(A;) = 0, or P(B) = 0, or both) by the 
formula 

P(A; B) = P(A;)P(B). (6) 


III. BASICS OF BELIEF FUNCTIONS 


Based on Dempster’s works [10], [11], Shafer did introduce 
Belief Functions (BF) to model the epistemic uncertainty and 
to reason under uncertainty in his Mathematical Theory of 
Evidence [2], also known as Dempster-Shafer Theory (DST). 
We consider a finite discrete frame of discernement (FoD) 
© = {0),02,...,An}, with n > 1, and where all exhaustive 
and exclusive elements of © represent the set of the potential 
solutions of the problem under concern. The set of all subsets 
of © is the power-set of © denoted by 2°. The number of 
elements (i.e. the cardinality) of 2° is 2!©l. A basic belief 
assignment (BBA) associated with a given source of evidence 
is defined as the mapping m(-) : 2° —> [0,1] satisfying the 
conditions m() = 0 and 7) 4-6 m(A) = 1. The quantity 
m(A) is the mass of belief of subset A committed by the 
source of evidence (SoE). A focal element X of a BBA 
m(-) is an element of 2° such that m(X) > 0. Note that 
the empty set @ is not a focal element of a BBA because 
m(@) = 0 (closed-world assumption of Shafer’s model for 


?The notation & means equal by definition 


the FoD). The set of all focal elements of m(-) is denoted? 
Foe(m) = {X C O|m(X) > 0} = {X € 2°|m(X) > O}. 
Belief and plausibility functions are defined by* 


Bel(A)= So m(X)= SO mX) 
xeE2° X€EFa(m) 
XCA 

PI(A)= S> m(X)=1-Bel(A), (8) 
xE2° 
XNAAO 


where A = @ — {A} = {X|X € © and X ¢ A}, is the 
complement of A in © and the minus symbol denotes the set 
difference operator. The width U(A*) = PI(A) — Bel(A) = 
xe Fqx(m) U(X) of the belief interval [Bel(A), PI(A)] is 
called the uncertainty on A committed by the SoE. F4«(m) 
is the set of focal elements of m/(-) not included in A and not 
included in A, that is F4«(m) + Fe(m) — Fa(m) — F4(m) 
and U(A*) represents the imprecision on the (subjective) 
probability of A granted by the SoE which provides the BBA 
m/(-). When all elements of Fo(m) are only singletons, m/(-) 
is called a Bayesian BBA [2] and its corresponding Bel(-) 
and Pl(-) functions are homogeneous to a same (subjective) 
probability measure P(-), and in this case F4«(m) = 0. 
According to Shafer’s Theorem 2.9, see [2] page 39 with its 
proof on page 51, the belief functions can be characterized 
without referencing to a BBA. The quantities m/(-) and Bel(-) 
are one-to-one, and for any A C O the BBA m(.-) is obtained 
from Bel(-) by Mébius inverse formula (see [2], p.39) 


m(A)= > (-1)!47?!Bel(B). (9) 


BCACO 


Because for any partition {Aj,...,A,} of the FoD 9, the 
equality Fo(m) = Fa,(m)U...UFa,(m) U Fa+(m) with 
Fax(m) = Fe(m) — Fa,(m) —...— Fa,(m) is valid, the 
following Total Belief Theorem (TBT) holds — see proof in 
the companion paper [7]. 


Total Belief Theorem (TBT): Let’s consider a FoD © with 
|O| > 2 elements and a BBA m(-) defined on 2° with 
the set of focal elements Fe(m). For any chosen partition 
{Aj,..., Ax} of O and for any B C 0, one has 


Bel(B)= S> Bel(A;NB)+U(A*NB), (10) 


where U(A* 1 B) = LXE F ge (m)|XEFp(m) MUX) € [0,1]. 

By expressing Bel(B) using TBT and noting that P1(B) = 
1—Bel(B), one get the Total Plausibility Theorem (TPIT) [7], 
which states that for any partition {A,,..., A, } of © and any 
B CO, one has 


PU(B)= S° PIA;UB)+1—k—U(A*NB), Cl) 


3More generally, the set of all focal elements of m/(-) included in a subset 
A C @ is denoted F4(m). 

4By convention, a sum of non existing terms (if it occurs in formulas 
depending on the given BBA) is always set to zero. 
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where U(A* 1 B) = DX eF qx (m)|XEFQ(m) MUX) € [0,1]. 


In DST framework, Shafer [2] did propose to combine 
S > 2 distinct sources of evidence represented by BBAs 
my,(.),..-,7%s(.) over the same FoD with Dempster’s rule 
of combination (i.e. the normalized conjunctive rule). The 
justification and behavior of Dempster’s rule have however 
been strongly disputed from both theoretical and practical 
standpoints as reported in [12]-[15]. Furthermore, Shafer did 
use also Dempster’s rule to establish formulas for conditional 
belief and plausibility functions [2]. Unfortunately, Shafer’s 
conditioning formulas are inconsistent with lower and upper 
bounds of imprecise conditional probability values as dis- 
cussed in [6], [16], [18] — see also Ellsberg’s urn example in 
[7]. That is why we do not recommend Shafer’s conditioning 
and Dempster’ rule in applications involving belief functions. 
This standpoint has been already shared by several authors 
before us, see by example [6], [8], [16], [19]-[21]. 

Recently in [7], we have proved that Fagin-Halpern condi- 
tional belief and plausibility formulas [6], [16], [17] can be 
directly obtained from TBT to define the conditional belief 
as the lower envelope (i.e. the infimum) of a family of 
conditional probability functions to make belief conditioning 
consistent with imprecise conditional probability calculus. In 
this paper we do not enter in details on the justification 
of Fagin-Halpern conditioning formulas but we just need to 
recall their expressions because they will be used in the next 
section to define the notion of credibilistic independence (or 
C-independence for short). Assuming Bel(B) > 0, Fagin 
and Halpern proposed the following conditional formulas (FH 
formulas for short) 


Bel(A|B) = Bel(AN B)/(Bel(AN B)+ PI(ANB)), (12) 


PI(A|B) = PUAN B)/(PUAN B)+ Bel(AN B)). (13) 

Fagin and Halpern proved in [6] that Bel(A|B) given by 
(12) is a true belief function>. Later, Sundberg and Wagner 
in [20] (p. 268) did give a clearer proof also (not very 
easy to follow though). By switching notations and assuming 
Bel(A) > 0, the previous FH formulas yield 


Bel(B|A) = Bel(AN B)/(Bel(AN B)+ PU(BNA)), (14) 


PI(B|A) = PUAN B)/(PUAN B)+ Bel(BN A)). (15) 
In [7], we did also generalize Bayes’ Theorem for working in 


the framework of belief functions as follows. 


Generalized Bayes’ Theorem (GBT): For any partition 


tL, ..., Ax} of a FoD O, any belief function Bel(-) : 2° 4 
(0, 1], and any subset B of © with Bel(B) > 0, one has for 
ie {1,. 

SA Bel(B|A;)q(Ai, B) “is 


Yt, Bel(B\A:i)q(Ai, B) + U((Ai 9 B)*) 


5satisfying the three conditions of Shafer’s Theorem 2.9, see [2] page 39. 


where 
q(A;, B) = Be (Ai) + U((Bn A;)) — U(B* N A,), 
U((Bn A;)") & PU(BN A;) — Bel(BN A;), 
U((A;N B)*) £ Pl(A; MB) — Bel(A; 1 B), 
U(B* 1 A;) + ye m(X). 


XEF px (m)|XEFa,(m) 
Note that FH formulas are consistent with Bayes formula 
(i.e. conditional probability formula) when the underlying 
BBA m(-) is Bayesian. Indeed if m/(-) is Bayesian, then 
PIUANB) = Bel(ANB) = P(ANB), PILANB) = Bel(AN 
B) = P(AN B) and Pl(BN A) = Bel(BN A) = P(BN A) 
and FH formulas become 
Bel(A|B) = PI(A|B) = P(ANB)/P(B) 


Bel(B|A) = PI(B|A) = P(AN B)/P(A) 


= P(A|B), (17) 
= P(B\A). (18) 


The advantage of FH formulas is their complete compatibil- 
ity with the bounds of conditional probability calculus [20] and 
their theoretical constructive justification drawn from TBT. 


IV. NOTION OF CREDIBILISTIC INDEPENDENCE 


In this section we generalize in the belief functions frame- 
work the notion of probabilistic independence of two events 
A and B expressed by the condition P(AM B) = P(A)P(B). 


A. Definition of credibilistic independence 


To define the credibilistic independence of two propositions 
A and B, we start from the FH belief conditioning formu- 
las (12)-(15) and we impose the Credibilistic Independence 
Constraints (CIC) by analogy of what has been done in the 
framework of probabilistic framework. So, we require the 
conditions 


Bel(A|B) = Bel(A), (19) 
Bel(B|A) = Bel(B), (20) 
PU(A|B) = PI(A), (21) 
PU(B|A) = PI(B), (22) 


which reflect the notion of independence of propositions A 
and B. 


Working with conditional belief expressions, the formula (12) 
and the condition (19) yield 


Bel(A)[Bel(AN B) + Pl(AN B)] = Bel(AN B) 
or equivalently 
Bel(A)Pl(AN B) = Bel(AN B)[1 — Bel(A)] 


By noting that 1 — Bel(A) = PI(A) and dividing both sides 
of the previous equality by P/(A) (assumed strictly positive), 


we get 
Bel(A) 


PI(A) 
Similarly, the formula (14) and the condition (20) yield 
Bel(B)[Bel(AN B) + PI(AN B)] = Bel(AN B), 


Bel(AN B) = PI(AN B) (23) 
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or equivalently 
Bel(B)Pl(AN B) = Bel(AN B)[1 — Bel(B)]. 


By noting that 1 — Bel(B) = PI(B) and dividing both sides 
of the previous equality by Pl(B) (assumed strictly positive), 
we get (B) 
Bel(B 

Bel(AN B)= 7 
a ) = BiB) 
If CIC (19) and (20) are satisfied, then because of (23) and 

(24), one must have also the following equality satisfied 

Bel(A) “ Bel(B) 

Bel(AN B) = — PI(AN B) = = 
NE a VO ey 


This equality imposes the following condition to be satisfied 


Bel(A)P1(B)P1(AN B) = Pl(A) Bel(B)PU(AN B). (25) 


PUAN B). (24) 


PUAN B). 


One sees that this equality is always satisfied if one has 


P1(AN B) = Bel(A)PI(B), 
P1(AN B) = Pl(A)Bel(B). 


(26) 
(27) 


Working with conditional plausibility expressions, the formula 
(13) and the condition (21) yield 


PI(A)[PI(AN B) + Bel(An B)| = Pl(ANB), 


or equivalently 


PI(A) = 
PI(ANB) = Bay (28) 
The formula (15) and the condition (22) yield 
PI(B)[Pl(AN B) + Bel(AnN B)] = Pl(AN B), 
or equivalently 
_ PUB) 
PUAN B) Bay (29) 


If CIC (21) and (22) are satisfied, then because of (28) and 
(29), one must have also the following equality satisfied 
PI(A 7 PUB 
PUAN B)= ( ) Bel(AN B) = ( ) 
Bel(A) Bel(B) 
This equality imposes the following condition to be satisfied 


PI(A) Bel(B)Bel(ANB) = Bel(A)Pl(B)Bel(ANB). (30) 


Bel(AN B). 


One sees that this equality is always satisfied if one has 


Bel(AN B) = Pl(A)Bel(B), 
Bel(ANM B) = Bel(A)PI(B). 


(31) 

(32) 
In summary, the four CIC are satisfied whenever the two 

following conditions are satisfied for the two belief intervals 

[Bel(AN B), PI(AN B)] and [Bel(AN B), PLAN B)]. 

e Condition C;: 


[Bel(AN B), Pl(AN B)| = [Pl(A) Bel(B), Bel(A)PU(B)]. (33) 


e Condition C3: 
[Bel(AN B), Pl(AN B)] = [Bel(A)PI(B), Pl(A)Bel(B)]. (34) 


The conditions C; and C2 are in fact just necessary conditions 

but not sufficient conditions because one needs also to impose 
the coherence conditions C’3 and C stating that right bound 
of any belief interval must always be greater (or equal) than 
its left bound. Hence the following inequalities (35) and (37) 
must also be satisfied. 


e Condition C3: The constraint Bel(AN B) < PI(AN B) 
and (33) impose to have 


Pl(A)Bel(B) < Bel(A)PI(B), (35) 
which is equivalent to the condition® 
PI(A) — Bel(A) < PI(A)PI(B) — Bel(A)Bel(B). (36) 


e Condition Cy: The constraint Bel(AM B) < PI(AN B) 
and (34) impose to have 


Bel(A)PI(B) < Pl(A)Bel(B), (37) 
which is equivalent to the condition’ 
Pl(B) — Bel(B) < Pl(A)PI(B) — Bel(A)Bel(B). (38) 


Thus, the conditions C), C2, C3 and C4 characterize math- 
ematically the notion of credibilistic independence (C-Indep) 
between two propositions A and B according to a given BBA. 
This allows us to establish the following theorem. 


C-Indep Theorem: Consider a FoD © and a BBA m(-) : 
2° ++ [0,1] and A and B two subsets of ©. The two 
propositions A and B are said credibilistically independent 
if and only if 

[Bel(AN B), PI(AN B)| 
[Bel(AN B), Pl(AN B)| = [Bel(A)PI(B), 


II 
3 
= 
= 
c 


and 
PI(A) — Bel(A) < Pl(A)PI(B) — Bel(A)Bel(B), 
PI(B) — Bel(B) < Pl(A)Pl(B) — Bel(A)Bel(B), 


where Bel(-) and Pl(-) are respectively the belief and plau- 
sibility functions related to the BBA m(-). 


Remark: Fagin-Halpern formulas (12)—(15) are defined only 
if Bel(B) > 0O and if Bel(A) > 0. This means that 
[Bel(A), P1(A)] =]a1, a2] is a left open interval (excluding 
ay = 0 and with ay < a2 < 1) and [Bel(B), Pl(B)| =]b1, be] 
is also a left open interval (excluding b; = O and with 
by < bg < 1). The credibilistic independence conditions of C- 
Indep Theorem can however be satisfied even® if Bel(A) = 0, 
or Bel(B) = 0, but in this case the Fagin-Halpern formulas 
yield 0/0 indeterminate form, which is perfectly normal. 


Substitute Bel(B) by 1 — Pl(B), Pl(B) by 1 — Bel(B) and rearrange 
terms. 

7Substitute Bel(A) by 1 — Pl(A), Pl(A) by 1 — Bel(A) and rearrange 
terms. 

8This is similar to probabilistic independence condition, where the condition 
P(AN B) = P(A)P(B) is valid even if P(A) = 0, or P(B) = 0, or if 
both equalities hold. 
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B. Discussion 


The propositions A and B can be credibilistically inde- 
pendent even if some of their lower or upper bounds equal 
respectively to zero or one as it will be shown in the next 
section. In this case, one can make a preliminary simple (pre- 
filtering) test to check the necessary condition that the left 
(lower) bound of belief interval must always be less (or equal) 
to right bound. For establishing such a test, it is worth noting 
that the following implications are true. 


AC B= Bel(A) < Bel(B), (39) 
AC B= PI(A) < PI(B). (40) 
Proof: Indeed, if A C B, then B — A (the complement of A 
in B) is also a subset of B. Since we have B = AU (B-— A) 


and AN (B — A) = @, from the definition of Bel(.) function, 
one can write 


Bel(B) = S° m(X) 
XCB 
= m(X) 
XCAU(B-—A) 
= 5° m(X)+ YS m(X), 
XCA XCB-A 


which is obviously greater (or equal) to Bel(A) = 
Yixcam(X). Therefore (39) is true. 


Because? AC B>BCAwhere A*{O-—Aand BA 
© — B (the complements of A and of B in the FoD 0), one 
always has Bel(B) < Bel(A). Hence, —Bel(A) < —Bel(B), 
and thus [PI(A) = 1 — Bel(A)] < [Pl(B) = 1 — Bel(B)}. 
Therefore (40) is also true. 

Because AMB is always included in A and in B, one always 
has Bel(ANB) < Bel(A) and Bel(ANB) < Bel(B). For the 
same reason, PI(AM B) < PI(A) and PI(AN B) < PI(B). 
Therefore the following inequalities always hold 


Bel(AN B) < min{ Bel(A), Bel(B)}, 
PI(AN B) < min{ PI(A), Pl(B)}. 


(41) 
(42) 


Let’s examine the bounds of the belief interval for the 
condition C’, given in (33), which is 


[Bel(AN B), PI(AN B)] = [Pl(A)Bel(B), Bel(A) Pl(B)]. 


e Lower bound of belief interval: Because Bel(A 9 
B) < min{Bel(A), Bel(B)} and Bel(AN B) = 


Pl(A)Bel(B), the following condition 
Pl(A)Bel(B) < min{ Bel(A), Bel(B)}, 


must be satisfied. In fact, because PI(A)Bel(B) < 
Bel(B) is always true because PI(A) € [0,1], the 
following coherence condition must hold 


PI(A)Bel(B) < Bel(A), (43) 

Letting « € B says that 2 does not belong to B, but the hypothesis 
A C B tells us that A is included in B and hence « does not belong to A 
as well, or in other words x € A. Therefore we have proven B C A. 


or equivalently (because Bel(B) = 1 — PI(B)) 


Pl(A) — Bel(A) < Pl(A)PI(B). (44) 


Note that the constraint (43) is a bit less restrictive 
than the inequality (35) of condition C3. This coherence 
constraint says that the uncertainty on A must be less than 
the product of plausibilities of A and of B if one wants 
to have equality for the lower bound of belief interval 
Bel(AN B) = PI(A)Bel(B) possible. 


e Upper bound of belief interval: Because PI(A NM B) = 
min{ PI(A), Pl(B)} and Pl(AN B) = Bel(A)PI(B) 
the following condition 

Bel(A)PI(B) < min{PI(A), Pl(B)}, 


must be satisfied. In fact, because Bel(A)PI(B) < 
PI(B) is always true because Bel(A) € [0,1], the 
following coherence condition must hold 

Bel(A)PIU(B) < PIA). 


Using the fact that PI(B) = 
rearranging terms we get 


(45) 
1 — Bel(B) in (45), and 


Bel(A)(1— Bel(B)) < PIA), (46) 
Bel(A) — Bel(A)Bel(B)) < PU(A), (47) 
—Bel(A)Bel(B)) < PI(A)— Bel(A). (48) 


As we see, the inequality (48) is always satisfied because 
Bel(A), Bel(B) and Pl(A) belong to [0,1] and because 
PI(A) > Bel(A), so that —Bel(A)Bel(B) < 0 whereas 
Pl(A) — Bel(A) > 0. 

Thus, there is in fact no need for a coherence constraint 
for the upper bound of belief interval to allow the 
equality Pl(AM B) = Bel(A)PI(B) possible. 


Let’s examine the bounds of the belief interval for the 
condition C2 given in (34), which is 
[Bel(AN B), PI(AN B)] = [Bel(A) Pl(B), Pl(A) Bel(B)]. 


e Lower bound of belief interval: Because Bel(A N 
B) < min{Bel(A), Bel(B)} and Bel(AN B) = 
Bel(A)PI(B), the following condition 


Bel(A)PI(B) < min{Bel(A), Bel(B)}, 


must be satisfied. In fact, because Bel(A)PI(B) < 
Bel(A) is always true because PI(B) € [0,1], the 
following coherence condition must hold 


Bel(A)Pl(B) < Bel(B), (49) 
or equivalently (because Bel(A) = 1 — PI(A)) 
Pl(B) — Bel(B) < PI(A) PUB). (50) 


Note that the constraint (49) is a bit less restrictive 
than the inequality (37) of condition Cy. This coherence 
constraint says that the uncertainty on B must be less than 
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the product of plausibilities of A and of B if one wants 
to have equality for the lower bound of belief interval 
Bel(AN B) = Bel(A)PI(B) possible. 

e Upper bound of belief interval: Because PI(A MB) < 
min{ PI(A), PI(B)} and Pl(AN B) = Pl(A)Bel(B), 
the following condition 

Pl(A)Bel(B) < min{ Pl(A), Pl(B)} 


must be satisfied. In fact, because PI(A)Bel(B) < 


d 


PI(A) is always true because Bel(B) € [0,1], the 
following coherence condition must hold 
Pl(A)Bel(B) < PI(B). (51) 


Using the fact that Pl(A) = 1 — Bel(A) in (51), and 
rearranging terms we get 


(1 — Bel(A))Bel(B) < PI(B), (52) 
Bel(B) — Bel(A)Bel(B)) < PIB), (53) 
—Bel(A)Bel(B)) < Pl(B)—Bel(B). (54) 


As we see, the inequality (54) is always satisfied because 
Bel(A), Bel(B) and PI(B) belong to [0, 1] and because 
PI(B) > Bel(B), so that —Bel(A)Bel(B)) < 0 
whereas Pl(B) — Bel(B) > 0. 

Thus, there is in fact no need for a coherence constraint 
for the upper bound of belief interval to allow the 
equality Pl(A NM B) = Pl(A)Bel(B) possible. 


In summary, the conditions 
PI(A) — Bel(A) < PI(A)PI(B), 
Pl(B) — Bel(B) < PI(A)PI(B), 


(55) 
(56) 


are necessary for the coherence of belief interval bounds 
defined in the conditions C; and C2. They express the fact 
that the width of belief interval (i.e. the uncertainty) of the 
proposition A and B must be less than the product of their 
plausibilities. The conditions (55)-(56) are very convenient 
to test quickly the non credibilistic independence of A and 
B, because if at least one condition (55), or (56) (or both) 
is not satisfied, then we are sure that A and B cannot be 
credibilistically independent. If the inequalities (55)-(56) are 
satisfied, we need to check if the conditions C), Co, C3 and 
C4, are also satisfied to declare the credibilistic independence 
of A and B. 


C. Special case: Bayesian belief functions 


The notion of credibilistic independence defined in the 
previous section is a generalization of the notion of proba- 
bilistic independence. This can be justified (and verified) by 
examining what provides the conditions C,, C2, C3 and C4 
in the limit case when the BBA m(-) is Bayesian. In this 
case, belief function Bel(-) and plausibility function Pl(-) 
coincide with a probability measure P(-), which means that 
the conditions C3 and Cy, characterized by formulas (36) 
and (38) are always satisfied because Pl(A) = Bel(A), 


and Pl(B) = Bel(B). Moreover, the conditions Cy and C2 
become equalities between the following degenerate intervals 


[P(AN B), P(ANB)| = [P(A)P(B), P(A)P(B)), 


[P(AN B), P(ANB)| = [P(A)P(B), P(A)P(B)), 


or equivalently 


P(AN B) = P(A)P(B), 
P(AN B) = P(A)P(B). 


These conditions are in fact equivalent to the probabilistic 
independence condition P(AN B) = P(A) P(B). This can be 
shown from the TPT formulas P(AN B)+ P(ANB) = P(A) 
and P(AN B) + P(AN B) = P(B) as follows. 

e If P(ANB) = P(A)P(B), then P(ANB)+P(ANB) = 


P(A)P(B)+ P(AN B) = P(A), and thus P(A B) = 


P(A)(1 — P(B)) = P(A)P(B). 


° If P(ANB) = P(A)P(B), then P(ANB)+P(ANB) = 


P(AN B) + P(A)P(B) = P(B), and thus P(AN B) = 

(1 — P(A))P(B) = P(A)P(B). 
Therefore, one has proved that our notion of credibilistic 
independence derived from FH conditioning coincides with 
the notion of probabilistic independence as soon as the belief 


function under consideration is Bayesian. 


V. ILLUSTRATIVE EXAMPLES 


For convenience (and not for significance), we give some 
simple examples illustrating the credibilistic independence 
between two propositions A and B with respect to some given 
basic belief assignments, so that the reader will be able to 
check by himself how to perform the derivations. 


A. Example I (Bayesian case) 


Let consider the FoD O = {61, 02,03, 04,05,0¢} and the 
(uniform) Bayesian BBA defined by m(0;) = 1/6 for i = 
1,2,...,6. Consider the two propositions (subsets) A and B 
of © defined as A © 6; U 6s and B £ 6, U4 U @. In this 
case, A = 03 U 64 U 65 U 06 and B = 0; U 63 U 65. We have 
also AM B = 61, and AN B = 04 U @. Because m(-) is a 
Bayesian BBA, Bel(X) = PI(X) = P(X) for X € 2°. Here 
one has 


Bel(A) = Bel(0, U 02) = m(01) + m(@2) = 1/3, 
PI(A) = P1(0, U 02) = m(01) + m(02) = 1/3, 
Bel(A) = Bel(03 U 04 U 65 U 06) = 1 — PI(A) = 2/3, 
PI(A) = P1(03 U 04 U 05 U 0g) = 1 — Bel(A) = 2/3, 
Bel(B) = m(62) + m(04) + m(66) = 1/2, 
PI(B) = m(62) + m(64) + m(46) = 1/2, 
Bel(B) = Bel(0,; U63 U5) = 1— PI(B) = 1/2, 
PI(B) = P1(6, U 63 U 65) = 1 — Bel(B) = 1/2, 
Bel(AN B) =m(61) = 1/6, 
PUAN B) = m(61) = 1/6, 
Bel(AN B) = Bel(64 U 06) = m(64) + m(46) = 1/3, 
Pl(AN B) = Pl(64 U 06) = m(04) + m(O6) = 1/3. 
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Conditions C, and C2 are satisfied because 


eee ,PI(AN B)| = [2, 4], 
| [PU(A) Bel(B), Bel(A)PU(B)] = [4-4,4-3] = [4,4], 
ez ee B)] = [3,4], 

[Bel(A) PI(B), Pl(A) Bel(B)] = [2- 3,33] = [4,4]. 


The condition C3 : Pl(A)Bel(B) < Bel(A)PI(B) is sat- 
isfied because PI(A)Bel(B) = 3 5 = Bel(A)PI(B). The 
condition Cy : Bel(A)PI(B) < Pl(A) Bel(B) is also satisfied 
because Bel(A)PI(B) = 3-5 = Pl(A)Bel(B). 

Because the conditions C1, Co, C3 and C4 are satisfied, 
the propositions A and B are credibilistically independent. In 
fact, in this Bayesian case, A and B are also probabilistically 
independent because P(AM B) = P(02) = P(A)P(B). Note 
that the coherence conditions (55) and (56) are of course 
satisfied because 


[PI(A) — Bel(A) = 0] 
[Pl(B) — Bel(B) = 0] 


[Pl(A) PUB) = 
[Pl(A) PUB) = 


(1/3) - (1/2)I, 
(1/3) - (1/2)].- 


< 
< 


B. Example 2 (Non Bayesian case) 

Let consider the FoD O = {6}, 02,03, 04,65} and the two 
propositions (subsets) A and B of © defined as A = 0;U02U03 
and B = 03U64. In this case, A = 64U05 and B = 6,U02U6s. 
We have also AN B = (0; U62U03) (0; U82U 45) = 0, Ub2 
and AN B = (64U 65) 1 (63 U 04) = 64. Suppose that the 
BBA m(-:) is simply defined as!° 


m(6;) =0.5, — m(63) = 0.1, 


Based on the BBA m(-), the belief and plausibilities of 
propositions involved in the derivations are 


[Bel(A), Pl(A)] = [Bel(A), Pl(A)] = 
[Bel(B), Pl(B)] = [Bel(B), Pl(B)] = 


(1, 1], (0, 0}, 


(0.1, 0.5], (0.5, 0.9], 
[Bel(AN B), PLAN B)| = [0.1, 0.5], 
[Bel(AN B), PI(AN B)] = (0.5, 0.9], 
[Bel(AN B), PL(AN B)] = [0,0]. 
The condition C1 is satisfied because 


Bel(AN B), PI(AN B)| = (0.5, 0.9], 


"4 pi(A) Bel(B), Bel(A) PU(B)] = [1 -0.5,1- 0.9}. 


—— 


The condition C2 is also satisfied because 


Bel(AN B), PI(AN B)| = [0,0], 


©2' 4 pel(A)PI(B), Pl(A) Bel(B)] = [0 -0.5,0- 0.1). 


—— 


The condition C3 given by Pl(A)Bel(B) < Bel(A)PI(B) 
is satisfied because PI(A)Bel(B) = 1-0.5 = 0.5 and 
Bel(A)Pl(B) =1-0.9=0.9. 


!0 All other elements of 2° which are not focal elements of the BBA m(-) 
receive a zero value. 


The condition Cy given by Bel(A)PI(B) < 
is satisfied because Bel(A)PI(B) = 0: 
Pl(A\Bel(B) =0-0.1=0, 

Therefore the propositions A and B are credibilistically in- 
dependent. One can easily verify using Fagin-Halpern for- 
mulas that [Bel(A|B), PI(A|B)| = [Bel(A), Pl(A)] and 
[Bel(B|A), Pl(B|A)] = [Bel(B), Pl(B)]. Indeed, in apply- 
ing (12) and (13) one gets 


Bel(AN B) 01 


Pl(A)Bel(B) 
0.5 


Bel(A\B 
el(AlB) = BARB) +PUANB) ~ 0140 
=1= Bel(A), 
PI(A|B) = PI(AN B) _ - 0.5 
PI(AN B) + Bel(AN B) 0.5+0 
=1= PI(A), 
and in applying (14) and (15), one gets 
Bel(B|A) = Bel(AN B) a 0.1 
Bel(AN B)+ PLAN B) 0.1+0.9 
= 0.1 = Bel(B), 
PI(BIA) = PI(AN B) se 0.5 
PI(AN B) + Bel(AN B) 0.5+0.5 
= 0.5 = PI(B). 


Note that the coherence conditions (55) and (56) are of 
course satisfied because 

[P1(A) — Bel(A) = 0] 

[Pl(B) — Bel(B) = 0.4] 


[Pl(A) PUB) = 
[Pl(A) PUB) = 


1-0.5 =0.5], 


< 
< 1-0.5 = 0.5). 


C. Example 3 (Non Bayesian case) 


Here we consider a more interesting example where the 
widths of belief intervals are not all restricted to zero. Consider 
the frame of discernment 0 = {0),62,63,64} and the very 
simple BBA m/(-) defined by m(63) = 0.7 and m(@; U@3) = 
0.3. We consider the propositions A = 6,U63 and B = 6,U63. 
In this case, we have A = 62 U 604, B = 6; U04, ANB = 03, 
ANB = 6; and AN B = 69. Based on the BBA m(-), 
the belief and plausibilities of propositions involved in the 
derivations are 


[Bel(A), PI(A)] = [1,1], [Bel(A), Pl(A)] = [0,0], 
[Bel(B), Pl(B)] = [0.7,1], [Bel(B), Pl(B)| = [0, 0.3], 
[Bel(AN B), PI(AN B)| = (0.7, 1], 
[Bel(AN B), PI(AN B)| = (0, 0.3], 
[Bel(AN B), PI(AN B)] = (0, 0}. 
The condition C is satisfied because 
be Bel(AN B), Pl(An B)] = [0,0.3), 
Pl(A)Bel(B), Bel(A)PI(B)] = [1-0,1- 0.3). 
The condition C2 is satisfied because 
ee Bel(An B), Pl(AN B)] = [0,0], 
Bel(A)PI(B), Pl(A)Bel(B)] = [0-1,0- 0.7]. 
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is satisfied because P(A) Bel( 
Bel(A) PUB) =1-0.3 = 0.3. 
The condition Cy given by Bel(A)PI(B 
is satisfied because Bel(A)PI(B) = 0- 
Pl(A)Bel(B) =0-0.7=0. 

Therefore the propositions A and B are credibilistically in- 
dependent. One can easily verify using Fagin-Halpern for- 
mulas that [Bel(A|B), PI(A|B)| = [Bel(A), PI(A)] and 
[Bel(B|A), Pl(B|A)] = [Bel(B), Pl(B)]. Indeed, in apply- 
ing (12) and (13) one gets 


The condition C3 given by Pl(A)Bel(B) 
\(B) 


Bel(AN B) 0.7 
Bel(A|B) = = = 
el(AlB) = Baan B)+PUANB) ~ 0.7 +0 
=1= Bel(A), 
PUAN B) 1 
PUA|B) = ————_—___ = ——_ 
ae) PI(AN B)+ Bel(ANB) 1+4+0 
=1= PI(A), 
and in applying (14) and (15), one gets 
Bel(AN B) 0.7 
Bel(B|A) = ————_——— = 
el(BlA) = Baan B)+PUANB) 0.7 +03 
= 0.7 = Bel(B), 
PI(AN B) 1 
PUB |A) = ————— = — 
vie) PUAN B)+Bel(ANB) 1+0 
=1= PI(B). 


Note that the coherence conditions (55) and (56) are of course 
satisfied because 


[PI(A) - 
[PI(B) — 


Bel(A) = 0] < [Pl(A)PI(B) = 
Bel(B) = 0.3] < [Pl(A)PI(B) = 


1-1=1], 
1-1=1]. 


VI. CONCLUSIONS 


In this paper the notion of credibilistic independence of two 
propositions has been proposed in the framework of belief 
functions. It is a generalization of the notion of (probabilistic) 
independence of two events defined classically in the theory 
of probability. Our definition is totally consistent with the 
probabilistic independence when the basic belief assignment 
is Bayesian because it is based on Fagin-Halpern belief condi- 
tioning formulas (derived from Total Belief Theorem) which 
are consistent with imprecise conditional probability calculus. 
Simple examples of the notion of credibilistic independence 
have also been given to illustrate how to test easily the 
credibilistic independence of two propositions in practice from 
a given basic belief assignment. 
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Abstract—In his Mathematical Theory of Evidence published 
in 1976, Shafer did propose belief and plausibility conditioning 
formulas based on Dempster’s rule of combination. It turns out 
that the proof given by Shafer for belief conditioning is incorrect 
and in this paper we present the correct proof of Shafer’s belief 
conditioning formula. 


Keywords: belief functions, Shafer’s conditioning. 


I. INTRODUCTION 


In his Mathematical Theory of Evidence published in 1976 
[1], Glenn Shafer did propose belief and plausibility condition- 
ing formulas based on Dempster’s rule of combination. It turns 
out that the proof of Theorem 3.6 given by Shafer in [1] (p. 66) 
for belief conditioning is incorrect and we will explain why. 
In this paper we present the correct proof of Shafer’s belief 
conditioning formulas. This paper must not be considered as a 
support for Shafer’s belief conditioning approach because we 
recommend Fagin-Halpern conditioning approach [2] instead 
(see our paper [3] for justification). It is only a clarification 
of correct obtaining of Shafer’s conditioning formulas, no less 
no more. 


II. BASICS OF BELIEF FUNCTIONS 


Based on Dempster’s works [4], [5], Shafer did introduce 
Belief Functions (BF) to model the epistemic uncertainty and 
to reason under uncertainty [1]. Shafer’s theory of evidence is 
often called Demspter-Shafer Theory (DST) in the literature. 
We consider a finite discrete frame of discernement (FoD) 
O = {0,,...,0,}, with n > 1, and where all exhaustive 
and exclusive elements of © represent the set of the potential 
solutions of the problem under concern. The set of all subsets 
of © is the power-set of @ denoted by 2°. The number of 
elements (i.e. the cardinality) of 2© is 2I°l. A basic belief 
assignment (BBA) associated with a given source of evidence 
is defined as the mapping m(-) : 2° — [0,1] satisfying the 
conditions m(@) = 0 and S> 4-90 m(A) = 1. The quantity 
m(A) is the mass of belief of subset A committed by the 
source of evidence (SoE). A focal element X of a BBA m(-) 
is an element of 2° such that m(X) > 0. Note that the empty 
set ) is not a focal element of a BBA because m(()) = 0 
(closed-world assumption of Shafer’s model for the FoD). The 
set of all focal elements (i.e. the core) of m(-) is denoted 
Fo(m) = {X C O|m(X) > 0} = {X € 2°|m(X) > Of, 


and the set of focal elements of m(-) included in A C ©O is 
denoted F4(m) = {X € Fo(m)|X NA = X}. Belief and 
plausibility functions are defined by! 


Bel(A) = 


Ss 


X€EFe(m) 
XCA 


Keo? 
XNAAO 


a 


X€Foe(m) 
XNAAO 


m(X) = 1—Bel(A). (2) 


When all elements of Fe(m) are only singletons, m(-) 
is called a Bayesian BBA [1] and its corresponding Bel(-) 
and Pl(-) functions are homogeneous to a same (subjective) 
probability measure P(-). The vacuous BBA representing a 
totally non informative source of evidence is characterized by 
the BBA m(0) = 1. According to Shafer’s Theorem | (see [1] 
page 39, with its proof on page 51), the belief functions can 
be characterized without referencing to a BBA. The quantities 
m/(-) and Bel(-) are one-to-one, and the BBA m(-) is obtained 
from Bel(-) by Mébius inverse formula (see [1], p. 39). 

In DST, Shafer [1] did propose to combine s > 2 distinct 
sources of evidence represented by BBAs mj(.),...,77s5(.) 
over the same FoD O with Dempster’s rule (i.e. the normalized 
conjunctive rule). Mathematically Dempster’s rule of combi- 
nation of s > 2 BBAs is defined by m#3° ,(0) = 0, and for 
any X #e 2° 


mip .(X) = [mi ®...@ma](X) 


2 mig" 6(X)/(— mist .@)), (3) 


CR A s 
where mix" 6(X) = 27 x,..x,e2® [Ties ™i(X%) is the 
X{NXEN...AXsH=X 
conjunctive rule (CR) of combination. The term m3" (0) 


'By convention, a sum of non existing terms (if it occurs in formulas 
depending on the given BBA) is always set to zero. 
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reflects the amount of dissonance between the sources [6]. 
Dempster’s rule is commutative and associative and preserves 
the neutrality of vacuous BBA in the fusion process. This 
rule has been disputed from both theoretical and practical 
standpoints, see [7]-[13] for discussions. In this paper we 
do not focus on Dempster’s rule, but only on Shafer’s belief 
conditioning formulas based on Dempster’s rule. 


A. Shafer’s conditioning formulas 


In this section we present briefly Shafer’s belief condition- 
ing approach as proposed by Shafer in [1]. Suppose that the 
effect of a new evidence on the frame of discernment © is 
to establish a particular subset B C © with certainty. Then 
Belz defined by Belj(A) = 1 if B C A and Bel2(A) = 0 
if B ¢ A will give a degree of belief one to the proposition 
corresponding to B and to every proposition implied by it [1], 
p.66. Shafer established the following important theorem? for 
conditional belief and plausibility. 


Theorem 3.6 [1], p. 67: Suppose Belz is defined by above two 
equations, and Bel, is another belief function over O. Then 
Bel, and Belz are combinable if and only if Bel, (B) < 1. If 
Bel, and Belz are combinable, let Bel,(-|B) denote Bel; 
Belg, and let Pl, and Pl,(-|B) denote the upper probability 
functions for Bel, and Bel; © Belz, respectively. Then for all 


Ace, 


Bel\(A U B) = Bel,(B) 


Beh (Alp) = SE ew 
PL(ANB 
Phy(A|B) = oe (5) 


Shafer’s proof of this theorem is in [1] (see pages 71-72), but 
we reproduce it here for convenience for a better identification 
of the mistake in this proof. 


Shafer’s Proof of Theorem 3.6 (as given in [1]): 
Bel,(B) <1 if and only if B overlaps the core of Bel, and 
since B is the core of Belg, this is indeed equivalent to Bel, 
being combinable with Belz. Denote the basic probability 
assignments of Bel,, Belz and Bel, @ Belz by m,, mg and 
m. Since B is the only focal element of Belz, and m2(B) = 1, 


Dempster’s rule yields 


SS m1 (Aj) SS mi(C) 


i (@} 
A;iNB=A BOC=A 
A a 6 
a arpa y mld) 1-Be,(e)’ © 
A;MB=0 


?In his theorem Shafer uses the notation P* for upper probability instead 
of Pl used generally in the literature to denote the plausibility function. 


and 
~ dv m(C) 
O4DCA BoC=D (7) 
Bel,(A|B) = (D) = 
ely ( | ) Dw ) 1 — Bel;(B) 
~ = m(C) 
6) 
_ O4BNCCA 
1 Bel;(B) (8) 
~ _m(C) 
eee 
~ “1 Bely(B) @) 
_ Bel\(A U B) = Bel,(B) 
7 1- Bel, (B) - 
Hence 


Pil,(A|B) = 1 — Bel,(A|B) (11) 
_ 1 Bel,(B) — Bel;(AU B) + Bel, (B) 
7 1 — Bel,(B) 
(12) 


1 — Bel;(B) rey 


III]. WHy SHAFER’S PROOF IS INCORRECT 


Although Shafer’s formulas (4)-(5) are correct?, we show 
why Shafer’s proof is incorrect. To obtain the final expression 
of Bel,(A|B) given by (10), Shafer goes from (8) to (9) in 
the proof of Theorem 3.6. So, Shafer implicitly assumes that 
the following equality is valid 


S> m(Cy= S> mi(C). (14) 
O¢BACCA (S73? 


In fact, (14) is wrong as shown in the next simple counter- 
example. Hence, Shafer’s proof for Bel,(A|B) is incorrect. 
This mistake casts doubts on the correctness of formulas in 
Theorem 3.6. However, we show in the next section that 
formulas given in Theorem 3.6 are in fact correct and we give 
in this paper their correct proofs. It is quite easy to verify that 


3if one accepts Shafer’s standpoint for belief conditioning based on Demp- 
ster’s rule. 
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(9) is not equal to (10) because* 


Bel\(AUB)= S> mi(C) 
CCAUB 
= Yo me)t+ YO mic) 
CCAUB CCAUB 
C¢B CcB 
= >So me)+ SY mC) 
CCAUB CC(AUB)nNB 
C¢B 
= S> mi(C)+ mi(C) 
CCAUB CC(ANB)uU(BNB) 
CZ¢B 
= >S2 m(c)+ m1(C) 
CCAUB CC(ANB)ud 
C¢B 
= YF m(c)+ YS mC) 
CCAUB CC(ANB) 
C¢B 
- m,(C) + Bely(AN B). 
CCAUB 
C¢B 


Therefore, the numerators of (9) and (10) are different in 
general because 


S* mi(C) = Bel,(AU B) — Bels(ANB) 


CCAUB 
C¢B 


(15) 


= Bel\(A U B) = Bel;(B). 


Remark: One may argue that there is just a small typo 

error in Shafer’s book, and in fact the incorrect ex- 

pression S°ccauemi(C) in (14), must be replaced by 
CZB 


MccauBs ™1(C). Even if one admits this possibility of typo 


C¢B 
error in Shafer’s proof, it is not trivial to prove the (modi- 
fied/corrected) equality 


S> m(C)= SY mi(C), (16) 
OABNCCA a 


to get the final Shafer’s belief conditioning formula. That 
is why we provide a complete exact and detailed proof of 
Shafer’s belief conditioning formula in section IV. 


A simple counter-example of Shafer’s proof 


Consider the following FoD 0 = {61,...,67} satisfying 
Shafer’s model. We consider and denote the focal elements of 
m (+) as follows A © {62, 03, 04, 05,07} = 02U03U01U05U67, 
B  {61, 02, 03,04} = 01 UO2U 03 UO4, Cy * {03, 05,06} = 
03 U Os U A, Co 4 {64,07} = 04U 67, C3 4 @2, and the 
BBA m(.) defined on the FoD © given by m (A) = 0.1, 
m,(B) = 0.1, m4(C1) = 0.2, m4(C2) = 0.3 and my4(C3) = 
0.3. We consider the subset B = 6, U 62 U 63 U 64 being 
the conditioning term, characterized by the BBA m2(B) = 1, 


4The denominators of (9) and (10) being equal, we just need to verify if 
the numerators of (9) and (10) are equal, or not. 


hence Belz(B) = 1. Note that B= 0\B = {65,06,07} 
and Bel, (B) = 0 because there is no focal elements of m4 (-) 
included in B = 65 U0, U 67. 

A 


e Let us calculate at first the sum S; = >> Cc m4(C) 


QABNCCA 

involved in (8). All focal elements C of my(-) such that ) 4 
BNC Cc Aare the focal elements A, C,, Co and C3 because 
BNA= O29 U 03 U 04 x 0 and A U 63 U 04 Cc A, Bn 
C, = 03 #9 and 63 C A, BN C2 = 64 4 O and 6, C A, 
BN C3 = 62 # and 62 C A. The focal element C = B of 
my(-) is not involved in the sum S; because if C = B, then 
BNC=BNB=B8B¢A. Therefore, one gets 


Sy = m,(A) +m4(C1) + m1 (C2) + m4 (C3) 
=0.1+0.2+0.34+ 0.3 = 0.9. 


Hence, based on (8) which is the correct expression obtained 
from (7), one gets the correct value of Shafer’s belief condi- 
tioning 
Bel,(A|B) = $,/(1 — Bel;(B)) = 0.9/(1 — 0) = 0.9. 
e Let us calculate the sum S23 = SGC 4y_mi(C) involved 
C¢B 


in (9). First note that AU B = 02 U 03 U 44 U6s5 U6 U 67 and 
the focal elements C of m(-) such that C C (AU B) and 
C ¢ B are the three focal elements A, C, and C2 because 
AC AUB and A ¢ B, C, = 03 U 05 U 06 C AUB and 
C, ¢ B, Cp = 04U 07 C AUB and C2 ¢ B. The focal 
element B = 0; U 62 U 3 U 04 of my(-) is not included in 
AUB = 62 U 03 U 04 U 45 U 06 U 67 because, in this example, 
B is not included in A, and of course because BN B = 0. 
The focal element C3 = 0) of mi(-) is included in AU B = 
02 U 03 U 64 U 05 U 06 U 67 but C3 = @2 is also included in 
B= 0, U 02 U 63 U @4, so that the condition C3 ¢ B is not 
satisfied. Based on these remarks, one gets for S2 


We can verify that the value of S_ corresponds to the value 
obtained with the correct formula (15), because Bel,(AUB) = 
m (A) + my4(C1) + my4(C2) + m4(C3) = 0.9 and Bel\(A NM 
B) = m4(C3) = 0.3 so that So = Bel\(A U B) = Bel\(A NM 
B) = 0.9—0.3 = 0.6. Hence, based on (9), one would get an 
incorrect value of Shafer’s belief conditioning 


Bel; (A|B) = S2/(1 — Bel;(B)) = 0.6/(1—0) = 0.6. 


Clearly, this counter-example shows that S; # Sj and proves 
that the equality (14) is incorrect. This simple counter exam- 
ples illustrates that the proof of Theorem 3.6 given by Shafer 
is incorrect. 


IV. CORRECT PROOF OF FORMULAS OF THEOREM 3.6 


Starting from Dempster’s rule we have m(@) = 0 and for 
all AA De 2°, 
D2 ma (X1)m2(X2) 


Xn, Xoe2? 
X1NX_g=A 


Le ey 
X1,X2€2° 
X1NXe=0 


m,(X1)me (X2) ; 


(17) 
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Because in conditioning by B #4 0, mo(-) is defined by 
m2(X2) = 1 if X. = B, and m2(X2) = 0 otherwise, the 
previous expression reduces for A 4 ) to 


mi (X1) ~~ mi (X1) 
Q X1E2° Q xX 1E2? 
2 niB=A #XiNB=A 
A)= = (18 
mT SS wae) isa 
xX1€2° 
X1iNB=0 


because Bel(B) = di x,e28 (X11) = 
XiCB 


> m,(X}). 


KiEO? 
XiNB=0 


Using the definition of the belief function, Bel,(A|B) for B 4 
() is given by 


Bel, (A|B) = S~ m(¥) 
Ye2° 
YCA 
mi (X1) 
X1€2° 
= S- QAXINB=Y _ 
yeo® 1-— Bel,(B) 
¥YCA 
2D mi (X1) 
Ye2°  x,e2° 
_ YCA 90AXiINB=Y 
— 1- Bel,(B) 
mi (X1) 
i. A 
X{NBC 
= 19 


Note that equation (19) is the same as Shafer’s equation (8) 
using slight modified notations’ for better presentation in the 
sequel. 


Because ™m(-) is a normalized BBA, one has for all B € 2° 


S>omi(Xi)+ So mi(X%) =1. (20) 
xX 1€2° xX1€2° 
X{NB=0 X1N BAO 


Also, for any A € 2° and in partitioning 2° in the subsets 
{Y € 2°|Y C A} and {Y € 2°|Y Z A}, the following 
equality also always holds 


2 S- mi(X1) + > m(X1)| 


yeo® x e2° Xen? 
YCA XiNBNY=6 XiNBnyFob 


+ >>| SS m(X%)+ YS my(X1)] = 1. 


ye2° =x ,e2° X1€2° 
YZA XiNBnY=0 XiNBNY 40 


(21) 


5We have also replaced symbol C by C for clarity. 


This equality can be rewritten equivalently as 


> m,(X 1) + > 


X1€2° X,E2° 
(X{NB=0)CA (X1NBAD)CA 


+ 3 my(X1) + y 


X 1e2? X1€2° 
(XiNB=0)ZA (XiNBADZA 


m(X1) 


mi(X1) =1. (22) 


The second term of the left hand side of (22) corresponds to 
the numerator of Bel,(A|B) given in (19). We can express it 
as 


Nc m,(X1) =1— S- m1(X1) 
Keo? X,€2° 
(XiNBAO)CA (X1NB=0)CA 
= m1(X1) — LS m(X}). 
X1€2° X1E2° 
(XiNB=0)ZA (X1iNBAO)ZA 
Because 
Yo om(X%)+ SS m(%)= SS m(%), 
XG eo? Xi,e2? X,E2° 
(X1NB=0)CA (X1NB=0)ZA X{NB=0 
one gets 
x mi1(X1) =1- ne m1(X1) 
X1E2° X1E€2° 
(X1NBAO)CA X{NB=0 
— Dy mG) 
X1E€2° 
(Xin BAO)ZA 
The last previous equality comes from the fact that 
Bel\(B)= SY) mi(X%)= SY m(X), 
xX 1E€2° xX,€2° 
XiCB XinB=0 
PL(ANB)= So m(Xi)= S> m(X). 
X1€2° X1€2° 
XiNBNAZO (XiNBAO)ZA 


Therefore, the numerator of Bel, (A|B) given in (19) equals 
1 — Ply(ANM B) — Bel;(B). Because Ply(AN B) = 1—- 
Beli(AN B) = 1— Bel,(AU B), one finally gets for the 
numerator of Bel, (A|B) 


pS 


Xy E€2° 
(XiNBHO)CA 


m,(X1) = Bel\(A U B) = Bel,(B), (23) 


and the final expression of Bel, (A|B) is given by 


Bel,(A|B) = (Bel,(AU B) — Bel,(B))/(1 — Bel,(B)). 
(24) 
This expression coincides with the final expression (10) given 
by Shafer in his flawed proof. The derivation of Pl,(A|B) 
given in Shafer’s proof is correct since we have proved that 
the expression of Bel, (A|B) is correct. 
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V. CONCLUSION 


In this paper we have shown why the proof of belief 
conditioning formulas given by Shafer is wrong and we have 
illustrated this incorrectness with a simple counter-example. 
After the identification of the mistake in Shafer’s proof, we 
have provided the correct proof of final expressions of Shafer’s 
belief conditioning formulas. For readers interested in belief 
conditioning, we provide a solid justification against the belief 
conditioning method proposed by Shafer in our companion 
paper [3]. Our criticism of Shafer’s conditioning approach is 
based on the Total Belief Theorem and Generalized Bayes’ 
Theorem. 
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Abstract—This paper presents two new theoretical contri- 
butions for reasoning under uncertainty: 1) the Total Belief 
Theorem (TBT) which is a direct generalization of the Total 
Probability Theorem, and 2) the Generalized Bayes’ Theorem 
drawn from TBT. A constructive justification of Fagin-Halpern 
belief conditioning formulas proposed in the nineties is also given. 
We also show how our new approach and formulas work through 
simple illustrative examples. 


Keywords: Total Belief Theorem (TBT), Generalized Bayes’ 
Theorem (GBT), belief functions. 


I. INTRODUCTION 


This paper presents new theoretical results for reasoning 
under uncertainty with belief functions (BF) introduced by 
Shafer in [1] in Dempster-Shafer Theory (DST). The first 
important result is the Total Belief Theorem (TBT) which is 
a generalization of the Total Probability Theorem (TPT) for 
the belief functions framework. From TBT, one can provide 
a solid justification of Fagin-Halpern (FH) belief conditioning 
formulas [3]—[5] which are generalizations of the classical con- 
ditional probability formulas. These theoretical results allow 
us to establish rigorously the Generalized Bayes’ Theorem 
(GBT). The belief conditioning problem is challenging, not 
new, and one of the two main methods usually adopted by 
users working with BF is : 1) Shafer’s belief conditioning 
method based on Dempster’s rule of combination [1], or 
2) the belief conditioning method consistent with imprecise 
probability calculus bounds [2], [6], [7] based on the lower and 
upper probability interpretation of belief functions popularized 
by Fagin and Halpern [3]. In this paper we focus on the second 
approach of belief conditioning because Dempster’s rule of 
combination presents serious problems as reported in [8]-[16]. 
Smets did also attempt to generalize Bayes’ Theorem (BT) and 
did propose his own GBT [17] on the basis of conditional 
embedding, conjunctive merging and Shafer’s conditioning. 
Unfortunately, Smets’ approach remains doubtful as reported 
in [18]. Our new GBT establishment is obtained by a direct 
constructive manner from TBT. It does not need extra as- 
sumptions nor some underlying ad-hoc construction principles. 
Also, we prove that our TBT and GBT presented in this work 
are fully consistent with classical TPT and BT as soon as the 
belief functions are Bayesian. 


This paper starts with a brief review of very basics of 
Probability Theory, including the Total Probability Theorem 
(TPT) and Bayes’ Theorem (BT) in Section II because this 
helps to have a better understanding of the generalizations we 
propose. A brief review of belief functions is given in Section 
IL, followed by classical Shafer’s and Fagin-Halpern’s belief 
conditioning methods respectively in Sections IV and V. In 
Section VI, we present the decomposition of the set of focal 
elements of any basic belief assignment (BBA) that allows 
us to establish formally the TBT and its generalization on 
Cartesian product space. The Section VII presents and justifies 
the new belief conditioning formulas drawn from TBT which 
are fully consistent with Fagin-Halpern conditioning formulas. 
This section also presents the generalization of Bayes’ theorem 
in the framework of belief functions. We illustrate our new 
theoretical results with a quite simple GBT example in Section 
VIII to show how to make derivations of GBT and to prove 
that Shafer’s conditioning results are inconsistent with GBT. 
Section IX concludes this paper. 


II. TOTAL PROBABILITY THEOREM & BAYES’ FORMULA 
A. Total Probability Theorem 


In probability theory, the elements 6; of the space O are 
experimental outcomes. The subsets of © are called events 
and the event {6;} consisting of the single element 6; is an 
elementary event. The space © is called the sure event and 
the empty set @ is the impossible event. We assign to each 
event A a number P(A) in [0, 1], called the probability of A, 
which satisfies the three Kolmogorov’s conditions: 1) P(@) = 
0; 2) P(O) = 1; and 3) if AN B = {0}, then P(AU B) = 
P(A)+P(B). These conditions are the axioms of the theory of 
probability [20]. The fundamental Theorem of the probability 
theory is the Total Probability Theorem (TPT), also called a 
the law of total probability, see [20] which can be stated as 
follows. 


Total Probability Theorem (TPT): Consider an event B and 
any partition! {A,, Ao,..., Ax} of the space ©. Then 


P(B) = P(BN Ai) + P(BN Ag) +...4+ P(BN Ak). (D 


'A partition of © is a collection of exclusive subsets of © whose union 
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B. Conditional probability and Bayes’ formula 

Starting from TPT formula (1) and assuming P(B) > 0, we get 
for any i € {1,...,k} after dividing each side of (1) by P(B) and 
rearranging terms the equality 


P(AiNB) _ 
“REY a, 


P(Ai NB) 


~ PB) 


(2) 
which allows us to define the conditional probability P(A;|B) by” 
P(A,|B) £ P(A;N B)/P(B). (3) 


Similarly, by considering an event A; of © and the partition {B, B} 
of ©, the TPT formula P(A;) = P(AiN B) + P(Ai NB) applies, 
and by dividing it by P(A;) (assuming P(A;) > 0), one gets 


P(AiN B) _1_ PAN) (4) 
P(A;)) P(Ai) ’ 
which allows to define the conditional probability P(B|A;) b 
P(B\A;) & P(A; B)/P(Ai). (5) 


From (3) and (5), one deduces the equality 


P(A: B) = P(Ai|B)P(B) = P(BIA) P(A). ©) 
From equality (6) and assuming P(B) > 0 and P(A;) > 0, we get 
P(Ai|B) = P(BIAi)P(Ai)/P(B), (7) 
P(BIAi) = P(Ai|B)P(B)/P(Ai). (8) 
Using (1) and noting that P(A; N B) = P(B\A;)P(A;), we get 
P(B)= 5> (BIA) PCAN). (9) 
i=1 


Substituting (9) in (7), we obtain Bayes’ Theorem (BT) formula stated 
mathematically as the following equation 


P(A) — —PUBIAD P(A) 
wha PIA) P(A) 


One can verify that the conditional probability defined by (3) 
satisfies the three axioms of the Theory of Probability [20]. 


(10) 


Previously, A; and B were events (subsets) of the same space O. 
If A; C ©; and B C Og with ©; F Oe, which corresponds to a 
so-called combined experiment [20], similar conditioning formulas 
can also be established by working in the Cartesian product space 
© £ ©, x Oe whose elementary elements are all the ordered pairs 
(Xp, Yq) with x, € ©, and yg € Oz. The two experiments are viewed 
as a single combined one whose outcomes are pairs (2p, Yq). In this 
space © = 0; X Oz, £p is not an elementary element but a subset 
of n elements of 0, i.e. {2p} = {(xp, y1),---, (Lp, yn) }. Similarly, 
Yq is not an elementary element but a subset of m elements of O, 
ie. {yg} = {(1, Yq),---5 (@m, Yq) }. If Ai C O1 and B C Oz, then 
Aj x B= {(Xp, Yq) |Up € As yq € B} C O. If one forms A; x O2 
and ©; x B one sees that A; x B = (A; x @2)N(O1 x B) = (O01 x 
B) (A; x Q2). Because the event A; x O2 occurs in the combined 
experiment if the event A; of the experiment | occurs no matter what 
the outcome of experiment 2 is, one has P(A; xO2) = P;(A;) where 
P, (Aj) is the probability of event A; in the experiment 1. Similarly, 
the event ©; x B occurs if B occurs in experiment 2 no matter what 
the outcome of experiment 1 is, so that P(O; x B) = P2(B) where 
P2(B) is the probability of event B in the experiment 2. Considering 
a partition {A;, A2,...,A,} of ©1 and a subset (event) B C Oz, 


?The notation & means equal by definition. 


and based on set theory and property of Cartesian product, one can 
establish also TPT formula 


P(@1 x B) = os P((@1 x B) (Ai x @2)), 


and Bayes’ formula 
P(OQ, x B\A; x O2)P(Ai x O02) 


P(A; x @2/O1 x B) = Se 
See P(OQ, x B\Ai x O2)P(Ai x O02) 
That is why, for notation convenience (and notation abuse), we can 
just use classical formulas even when working with different sets of 
experimental outcomes ©; and Oz. One just has to keep in mind that 
in this case A; must be understood as A; x Q2 and B as ©; x B. 


III. BASICS OF BELIEF FUNCTIONS 


Based on Dempster’s works [2], [19], Shafer did introduce Belief 
Functions (BF) to model the epistemic uncertainty? and to reason 
under uncertainty [1]. We consider a finite discrete frame of dis- 
cernement (FoD) © = {01,02,...,0n}, with n > 1, and where 
all exhaustive and exclusive elements of © represent the set of the 
potential solutions of the problem under concern. The set of all 
subsets of © is the power-set of © denoted by 2°. The number of 
elements (i.e. the cardinality) of 2° is 2!°!. A basic belief assignment 
(BBA) associated with a given source of evidence is defined as the 
mapping m(-) : 2° —> [0,1] satisfying the conditions m(0) = 0 
and $7 4¢9e m(A) = 1. The quantity m(A) is the mass of belief of 
subset A committed by the source of evidence (SoE). A focal element 
X of a BBA m(-) is an element of 2° such that m(X) > 0. Note that 
the empty set () is not a focal element of a BBA because m(@) = 0 
(closed-world assumption of Shafer’s model for the FoD). The set of 
all focal elements of m(-) is denoted 


Fo(m) * {X C O|m(X) > 0} = {X € 2°|m(X) > 0}. AD 
The set of focal elements of m(-) included in A C © is denoted 
Fa(m) ={X € Fo(m)|XNA= X}. (12) 
Note that if A C B C ©, then Fa(m) C Fa(m). Also, 
VA,B C © one has Fanp(m) = Fa(m) M Fp(m), but 
Faus(m ) # Fa(m) U Fa(m) in general. The set Foe(m) can 
always be partitioned as {F4(m), F4(m), Fa«(m)} where* 
Fax(m) = Fo(m) — Fa(m) — Fa(m) (13) 
={X € Foe(m)|XNAAPand XNAZO}, (14) 


represents the set of focal elements of m(-) which are not subsets of 
A and not subsets of A = © — {A} = {X|X € © and X ¢ A}, 
where A is the complement of A in © and the minus symbol denotes 
the set difference operator. 


Belief and plausibility functions are defined by? 


Bel(A)= So m(X)= So m(X)= SO m(X), 
xe29° X€EFe(m) XEFA(m) 
XTCA XCA 
(15) 
PU(A)= SY > m(X)= So m(X)=1-Bel(A). (16) 
xe2? X€EFe(m) 
XNAAO XNASO 


The width U(A*) = Pl(A) — Bel(A) of the belief interval 
[Bel(A), PI(A)] is called the uncertainty on A committed by the 


3Also called sometimes the cognitive uncertainty by some authors. 

*For notation convenience, we use A* to denote focal elements of m/(-) 
which are not in A, nor in A. 

5By convention, a sum of non existing terms (if it occurs in formulas 
depending on the given BBA) is always set to zero. 
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SoE. It represents the imprecision on the (subjective) probability of 
A granted by the SoE which provides the BBA m(-). The uncertainty 
U(A*) can also be expressed directly as 


U(A*)= 
XEF 4x (m) 


It is worth noting that U(A*) = Pl(A) — Bel(A) = (1— Bel(A)) — 
(1 — PI(A)) = PI(A) — Bel(A) = U(A*), or equivalently 


» 


X€F qx (m) 


m(X). (17) 


U(A*) = m(X), (18) 


where Fj»(m) & Fe(m) — F4(m) — Fa(m) = Fax(m). 


When all elements of Fe(m) are only singletons, m/(-) is called a 
Bayesian BBA [1] and its corresponding Bel(-) and Pl(-) functions 
are homogeneous to a same (subjective) probability measure P(-). In 
this case F4»(m) = F4»(m) = @. According to Shafer’s Theorem 1 
below, see [1] page 39 with its proof on page 51, the belief functions 
can be characterized without referencing to a BBA. 


Theorem 1: If © is a FoD, then a function Bel : 2° + [0,1] is a 
belief function if and only if it satisfies the following conditions: 

e B1) Belief in impossible event is zero, that is Bel(Q) = 0. 

e B2) Belief in the certain event is one, that is Bel(Q) = 1. 

e B3) For every positive integer n, and for every collection Ai,..., 
An of subsets of O 


Bel(AiU...UAn)> > (-1)"""" Bel( 1.Ai). (19) 


PC{L jes; n} 
IFO 


Quantities m(-) and Bel(-) are one-to-one, and for any A C © the 
BBA m(-) is obtained from Bel(-) by Mébius inverse formula (see 


[1], p.39) 
S> (-1)!4-?!Bel(B). 


BCACO 


m(A) = (20) 


Shafer [1] did propose to combine s > 2 distinct sources of 
evidence represented by BBAs ™m1(.),...,7™5(.) over the same FoD 
with Dempster’s rule (i.e. the normalized conjunctive rule). However 
Dempster’s rule has been strongly disputed from both theoretical and 
practical standpoints as reported in [16], [21], [22]. In particular, 
the high (or even very low) conflict level between the sources 
can be totally ignored by Dempster’s rule which is a very serious 
problem [15]. Also, Shafer’s conditioning (based on Dempster’s rule) 
is inconsistent with the probabilistic conditioning (see next section). 


TV. SHAFER’S CONDITIONING 


A. Shafer’s conditioning formulas 


Shafer’s conditioning formulas are established in Theorem 3.6 p. 
66 of [1] from Dempster’s rule of combination of the original BBA 
m(-) with the BBA mp(B) = 1 focused on B. We review them 
for convenience. For A,B C © with Pl(B) > 0, Bel(A|B) and 
PI(A|B) are given by 


Bel(A|B) = (Bel(AU B) — Bel(B))/(1— Bel(B)), (21) 
PI(A|B) = Pl(AN B)/PUB). (22) 

The expression (21) of Bel(A|B) is equivalent to 
Bel(A|B) = (PI(B) — Pl(B 1 A))/PUB), (23) 


because one has always (from definition of belief functions) 


Pl(B) =1-— Bel(B) and the numerator of (21) can be written as 
Bel(AU B) — Bel(B) = Pl(B) — Pl(BN A). 


If A=6, Bel(0|B)=PI(0|B)=0, and if A=0, 
Bel(@|B) = Pl(Q|B) = 1. Also, if B = ©, Bel(A|@) = Bel(A) 


and Pl(A|O) = PI(A). Note that if B = A in (22)-(23), we get 
Bel(A|A) = PI(A|A) = 1 which fits with the common sense. 


In reversing the roles played by A and B and switching the 
notations in previous expressions, the following formulas also hold 
(assuming P(A) > 0) 


Bel(B|A) = (PU(A) — Pl(AN B))/PI(A), 
PI(B\A) = PIB A)/PI(A). 
From (22) and (25), one deduces that 
PUAN B) = PI(A|B) PUB) = PI(B\A) PIA). 


(24) 
(25) 


Hence, the following formula applies for conditional plausibilities 
when PI(B) > 0 


Pl(A|B) = Pl(B|A)Pl(A)/ PUB). (26) 


Shafer’s formula (25) is similar to conditional probabilities (3) 
when replacing plausibility by probability. So, at first glance it seems 
appealing. In the sequel we show why this is not the case. 


B. Drawback of Shafer’s conditioning 


The main drawback of Shafer’s conditioning is that the bounds 
of belief interval [Bel(A|B), Pl(A|B)] obtained by (21)-(22) are in 
general incompatible with lower and upper bounds of the conditional 
probability P(A|B). This problem makes Shafer’s conditioning based 
on Dempster’s rule very disputable and cast doubts on pertinence 
(validity) of Shafer’s conditioning results when used in applications. 
This serious problem has already been reported and addressed by 
several authors [3], [6], [7], [11] with some examples. To easily show 
this incompatibility of Shafer’s conditioning with probability calculus 
we present briefly the famous Ellsberg urn example [23]. 


Example 1 (Elisberg urn): We consider an urn with red (R) balls, 
black (B) and yellow (Y) balls. One knows that 1/3 of balls are 
red balls and 2/3 or balls are black and yellow balls. So the a 
priori information about the chance to pick a ball in the um can 
be represented by a (parametric) probability mass function P(-) 


P(R)=1/3, P(B)=2/3-2, P(Y)=a, 


where x is an unknown number/parameter in [0,2/3]. Therefore, 
P(B) and P(Y) are unknown but their bounds are known. In fact, 
this problem can be seen as a problem of imprecise probabilities 
where P(R) + P(B) + P(Y) =1 with 


P(R) € [1/3, 1/3], P(B) € (0, 2/3), 


Now let’s suppose that someone picks a ball at random in the 
urn and tell us that the color of the ball is not black, i.e. the event 
B= RUY has occurred. How do we must revise (update) our 
prior probabilities with this new information? The correct answer to 
this question is obtained by computing the conditional probabilities 
P(R|B), P(B|B) and P(Y|B) and by analyzing their bounds. This 


is done using the fact that P(B) = P(RUY) = P(R)+ P(Y) 


P(Y) € 0, 2/3}. 


P(RNY) = P(R) + P(Y) = (1/3) + a. Indeed, P(RNY) =0 
because the events R and Y are mutually exclusive. So, we get 
5, P(RA(RUY)) P(R) _— 1/3 
as P(RUY) ~~ (1/3)+a (1/3) +2 
=, P(BO(RUY)) PO) _ 0 
a P(RUY) ~~ (1/3) +2 (1/3) +2’ 
5, P(YN(RUY)) PY) _ x 
a= P(RUY) — (1/3) +2 (1/3) +2" 
If « = 0, then P(R|B) = 1 and P(Y|B) = 0. If x = 2/3, then 
P(R|B) = 1/3 and P(Y|B) = 2/3. Therefore after conditioning 


we get 


P(R|B) € [1/3, 1], P(B|B) € [0,0], P(Y|B) € [0, 2/3]. 
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Let’s examine what we get with Shafer’s conditioning. The problem 
is modeled using the a priori BBA m(-) defined on the FoD 0 = 
{R, B, Y} with m(R) = 1/3 and m(BUY) = 2/3 which gives the 
belief intervals [Bel(.R), Pl(R)] = [1/3, 1/3], [Bel(B), Pl(B)| = 
(0, 2/3] and [Bel(Y), Pl(Y)] = [0, 2/3]. With Shafer’s conditioning 
formulas and noting that Pl(R) = 1/3, PI(B) = 2/3, PUY) = 
2/3, and PI(RUY) = 1, we get 


Bel(R|B) = PURUY) =e (BUY)) _ es = 1/3, 
Bell(B|B) = PURYY) = PUCRUY)O(RUY)) _ 1- PURUY) ay 
ee) e) PURUY) 1 
Bel(Y|B) = PURUY) eee (RUB)) _ a = 9/3, 
=, PURN(RUY)) PUR) _ 
ne PURUY) PURUY) 1/3, 
Pile) = PUB ORYY)) _ PU) _ 
( = PURUY) 1 
=, PUY N(RUY)) PUY) 
PUYIB) = —“EiRuy)  PRUY) 7! 


Hence with Shafer’s conditioning we get results incompatible with 
the real bounds of conditional probabilities because 


[Bel(R|B), PUR|B)] = (1/3, 1/3] # (1/3, 1), 
[Bel(B|B), Pi(B|B)] = (0, 0), 


[Bel(Y|B), PUY |B)] = (2/3, 2/3] # (0, 2/3]. 


V. FAGIN-HALPERN CONDITIONING 


Fagin and Halpern (FH) proposed in [3], [4] to define the condi- 
tional belief as the lower envelope (i.e. the infimum) of a family 
of conditional probability functions to make belief conditioning 
consistent with imprecise conditional probability calculus. 


A. Fagin-Halpern conditioning formulas 


Assuming Bel(B) > 0, Fagin and Halpern proposed the following 
conditional formulas (FH formulas for short) 


Bel(A|B) = Bel(AN B)/(Bel(AN B)+ PI(ANB)), (27) 
PI(A|B) = PUAN B)/(PU(ANB)+Bel(ANB)). — (28) 


They prove in [3] that Bel(A|B) given by (27) satisfies the 
three conditions of Theorem 1 and so FH belief conditioning is 
an appealing solution for BF conditioning. However, it is quite 
obscure how Fagin and Halpern did obtain (construct) FH formulas. 
A justification has been given by Sundberg and Wagner in [7] (p. 
268) but it is not very easy to follow. In this paper, we justify clearly 
and directly the establishment of FH formulas from the simple and 
direct consequence of the Total Belief Theorem (TBT). 

Similarly, by switching notations and assuming Bel(A) > 0, the 
previous FH formulas can be rewritten as 


Bel(B\A) = Bel(AN B)/(Bel(AN B)+ PIB A)), (29) 
PI(B|A) = PAN B)/(PUANB) + Bel(BN A)). (30) 
As we see, FH formulas are also consistent with Bayes’ formula 
when the underlying BBA m(-) is Bayesian. Indeed if m(-) is 
Bayesian, then Pl(ANB) = Bel(ANB) = P(ANB), PI(ANB) = 
Bel(ANB) = P(ANB) and PI(BN A) = Bel(BNA) = P(BNA) 
and FH formulas become equivalent to 
Bel(A|B) = PU(A|B) = P(ANB)/(P(ANB)+P(ANB)). (31) 


Thanks to TPT formula (1), 
formula is P(AN B) + P(AN B) = 


the denominator involved in these 
P(B), therefore 


Bel(A|B) = PI(A|B) = P(AN B)/P(B) = P(A|B). (32) 
Similarly, one can also easily verify that 
Bel(B|A) = Pl(B\|A) = P(AN B)/P(A) = P(B\A). — (3) 


B. Advantage of Fagin-Halpern conditioning 


The advantage of FH conditioning is its complete compatibility 
with the conditional probability calculus [7], [25]. We show what 
provides FH conditioning in the previous Ellsberg urn example. 


Ellsberg urn example revisited: Applying FH conditioning formulas 
with the conditioning event B = RUY we obtain 


Bel(RN(RUY)) 


eee) Bel(RN(RUY)) + PI(BUY)N(RUY)) 
_ Bel(R) _ 1/3 ~ 1/3 
Bel(R)+ PUY) (1/3) + (2/3) : 
PU(R|B) = PIR (RUY)) 


Bel((BUY)N(RUY))+ PURO (RUY)) 
PI(R) ee 


~ Bel(Y) + PIR) 0+(4/3) — 


Similarly, we can verify that Bel(B|B) = 0, Pl(B\|B) = 0, 
Bel(Y|B) = Oand PI(Y|B) = 2/3. Therefore with these condition- 
ing formulas, we get the correct bounds of the imprecise conditional 
probabilities 


[Bel(R|B), Pl(R|B)] = [1/3, 1, 
[Bel(B|B), Pl(B|B)] = (0, 0, 
[Bel(Y|B), PUY|B)] = [0, 2/3]. 


One can also verify that Bel(0|B) = 0, Bel(RU BIB) = 1/3, 
Bel(RUY|B) = 1, Bel(BUY|B) = 0 and Bel(RUBUY|B) = 1. 
Applying Mobius inverse formula (20) with Bel(-|B), one gets the 
conditional BBA m(R|B) = 1/3 and m(RUY|B) = 2/3, whereas 
with Shafer’s conditioning one gets m(R|B) = 1/3 and m(Y|B) = 
2/3. One sees that with Shafer’s conditioning, because (B UY) 
(RUY) 4 @ the mass m(B UY) = 2/3 is entirely transferred 
(optimistically) to the most specific focal element Y included in B = 
RUY. With FH conditioning, the mass m( BUY) = 2/3 is entirely 
transferred (pessimistically, or cautiously) to the least specific focal 


element RU Y included in B= RUY. 


VI. TOTAL BELIEF THEOREM (TBT) 


In this section, we extend TPT theorem to BF and we establish the 
Total Belief Theorem (TBT) based on a decomposition of Fe(m). 


A. Decomposition of Fe(m) 

Let us consider a FoD © = {61,.. Ojai} with |O| > 1 elements, 
and a BBA m(-) defined on 2° with a given set of focal elements 
Fe(m). Considering any partition {A;, A2,..., Ax} of the FoD ©, 
then Fe(m) can be obtained by the union of following subsets 


Foe(m) = Fa,(m)U...U Fa, (m) U Fax(m). 


where Fa,(m) (¢ = 1,...,k) is the set of focal elements of m(-) 
included in A;, and F4«(m) is the set of focal elements of m/(-) 
which are not included in A;, i = 1,...,k. We use the notation 
A* for representing the entity characterized by the focal set F4«(m) 
mathematically defined by 


Fa«(m) & Fe(m) = F Ax (m) = eee — Fa, (m). 


The entity A* has in general no explicit form and it is used only for 
notation convenience and conciseness. Because A; fori = 1,...,k 
are mutually exclusive (disjoint), the sets F4,(m) are also mutually 
exclusive and therefore Nj=1,...,.n(Fe(m) — Fa,(m)) = Fe(m) — 
Fa,(m)—...—Fa,(m) because all possible intersections of focal 
sets including F4,(m)M Fa,(m) for 7 # % equal the empty set. 
Hence F4«(m) can also be expressed as 


(34) 


(35) 


(36) 
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where F4,(m) = Fo(m)—Fa,(m) = Fx,(m 
when partitioning © as {A;, Ai} one has Fax(m) 
Fa,(m) — Fx, (m). 

Example 2 Consider O = {01, 02, 03, 04,05} and a BBA m(-) de- 
fined on 2°, with set of focal elements Fo(m) = {X1, X2,..., Xs} 
chosen as follows: X, = , X2 = 0, U 62, X3 = G2 U 43, 
X4 = 03 U 04, X5 = O04, Xp = 04 UO5, X7 = 01 U 03 U Os, 
and Xg = 05. Consider also the partition {A1, A2, A3} of O with 
Ai = {01,02}, Ao = {63,64} and A3 = {05}. Therefore, 


Fa,(m = {X1, X2} = {61, 01 U 2}, 
Fas (m = {X4, Xs} = {03 U 04, 04}, 

Fa;z(m) = {Xs} = {05}, 

Fax(m) = {X1,..., Xe} — {X1, Xo} — {X4, X5} — {Xs}, 
4 {X3, Xe, X7} = {02 U @3, 64 U Os, 01 U 03 U6s}, 


) + Faz (m) because 
2 Fo (m) — 


Fa,(m) = Fo(m) — {X1, X2} = {X3, Xa, X5, Xo, X7, Xs}, 
Fay(m) = Fo(m) — {X4, Xs} = {X1, Xo, X3, Xo, X7, Xs}, 
Pay (m _ Fo(m) ~ {Xs} = {X1, X2, X3, X4, X5, Xo, X7}. 


Applying (36), one gets 
Fa, (m)N Fag(m)M Fas (m) = {Xs, Xe, X7} = Fas(m). 


B. Total Belief Theorem (TBT) 


Based on the previous decomposition of Fe(m) according to any 
partition {Ai,..., Ax} of the FoD 0, the following TBT holds. 


Total Belief Theorem (TBT): Let’s consider a FoD 9 with |O| > 
2 elements and a BBA m/(-) defined on 2° with the set of focal 
elements Fo(m). For any chosen partition {Ai,..., Ax} of O and 
for any B C ©, one has 


Bel(B) = D Bel(A; NB) +U(A* NB), (37) 
I= jisaght 
where F4x(m) = Fo(m) — Fa,(m) —...—Fa,(m) and 
U(A*nB)4 » m(X). (38) 


XEF yx (m)|XE€FB(m) 


Proof of TBT: See appendix. 


A* is a shorthand notation for the entity associated to the set of 
focal elements F4»(m) of the BBA m(-) involved in the summation 
(38) of U(A*  B). From (38), one sees that U(A* 9 B) € [0, 1). 
If one applies TBT with B = O, we get for any chosen partition 
{A1,..., Ax} of ©, D0,1, Bel(Ai) + U(A*) = 1 where 
U(A*) 4 Dx EF 4x (m) MX). This equality corresponds to TPT if 
U(A*) = 0 (ie. there is no uncertainty on the value of probabilities 
of Aj,i=1,. »k). Note that if B = © and if the FoD O is simply 
Be a as {As A,, A = Ap}, then U(A* NB) = U(A*NO) = 
U(A*) = PI(A) — Bel(A) = Pl(A) — Bel(A). 

Corollary 1 of TBT: If m/(-) is Bayesian, then TBT is consistent 


with the Total Probability Theorem (TPT) because U(A*M B) = 0 
and Bel(-) is homogeneous to a probability measure. 


In expressing Bel(B) with TBT and noting that PI(B) = 1 — 
Bel(B), one can also easily establish the following (not so elegant) 
Total Plausibility Theorem (TPIT). 


Total Plausibility Theorem (TPIT): For any partition {Ai,..., Ax} 
of © and any B C O, one has 
PU(B)= S> PUA;UB)+1—k-U(A*MB). — (39) 


i=1,...,k 


C. Example for TBT 

Consider the FoD O = {0,1 = .,7} and Fo(m) = 
{X1, Xo,...,X9} of a BBA m(-) defined over 2° as in Table I. 
Consider also the partition {A1, A2, A3} of O with Ay £6,U63U 
6407, Ao & 62U0s5 and A3 4 06 and the subset B = 04U05U06U07 
of ©. The Table Il summarizes the belief values of different subsets 
of © which are needed to apply TBT. 


X1 = 02 U3 U4 UOs5 U 87 
X2 = 6, Ub2 U63 U4 

X3 = 03 UIs UI6 

X4 = 04 U 07 


= 06 U67 
62 U 03 U 67 
Xg = 61 U4 UNG 
X9 = 96 


Table I 
FOCAL ELEMENTS AND THEIR MASSES. 


B= 04U 05 U 6 U 67 
Aj = 61 U03 U 64 U 67 
Ao = 02 UGO5 


B)=0. 

(Ai) = 0.04 
Bel(A2) = 
Bel(A3) = 
Bel(Ai By = 0.04 
Bel(A2N B) =0 
Bel(A3 M B) = 0.05 

Table II 

BELIEF VALUES USED FOR THE DERIVATIONS. 


A3 = 06 
A1NB=64U 67 
A,NB=4s5 
A3N B= 46 


In this example, one has 


Fp(m) = {X4, Xo, Xo} and F_(m) = {Xs}, 


Fa,(m) = {Xa} and FZ, (m) = {Xs, Xo}, 
Fa(m) = {Xs} and Fa, (m) = {X4, Xo, Xs, Xo}, 
Fa;(m) = = aa and Fas (m = = {X1, X2, X4, Xs, X7}, 
Fas (m) = Fe(m) — Fa,(m) — Fa,(m) — Faz(m), 
a {X1, X2, X3, X6, X7, Xs}. 
Therefore, one has 
U(A* NB) = Ss m(X) = m(Xe) = 0.30. 


X€F4x(m)|X€F_B(m) 


In applying TBT formula (37), one can easily verify 
Bel(B) = » Bel(BM Ai) + U(A* 7B) 


I=1,..4,3. 


= 0.04 + 0+ 0.05 + 0.30 = 0.39. 


D. Generalization of TBT 


As explained in Section II-B, we have to work in Cartesian product 
space © = ©; X Oz if the partition {Ai,..., A;} is related to a 
given FoD ©; and B is a subset of an other FoD O2. Because 
{Ai,..., Ax} is a partition of 01, then {Ai x O2,..., Ax x O2} 
defines a partition of © = ©; x Oe and because 0; x B = 
Uj=1,...,.4((O1 X B) (A; x O2)), one can always apply TBT in the 
Cartesian space ©. More precisely, one has 


= 5) Bel(A; x B) +U(A* x B)), 


omer AE k 


and where U(A* x B) £ U((A* x ©2)M (O1 x B)). 

This formula can be used if and only if one knows the joint BBA 
m(-) (or equivalently the joint belief) defined over the powerset of 
the Cartesian space O = ©; Xx Oo. 


Bel(@1 x B) (40) 
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VII. CONDITIONAL BELIEF FUNCTIONS AND GBT 
Before justifying FH conditioning from TBT and presenting the 
Generalized Bayes’ Theorem for BF, we establish a useful lemma. 


Lemma 1: Consider a FoD © with a given BBA m(-) defined over 
O, for partition {A;, A;} of O and any B C O, one always has 


0 <U((A; nN B)*) —U(A* NB) <1, (41) 


where U((A; 9 B)*) = 2 xEF a npy= (rm) m(X) and U(A*NB) + 
xeF ys (m)|X€Fp(m) m(X). 


Proof of Lemma 1: See appendix. 


A. Conditional belief and plausibility 


We consider a partition {A;, A;} of the FoD © and a subset B 
of ©. Using TBT, one has 


Bel(B) = Bel(AiN B) + Bel(AiM B)+U(A* NB). (42) 
Hence 
Bel(B) —U(A* NB) = Bel(AiN B)+ Bel(AiN B). (43) 
Moreover, since one has (by definition) 
U((AiN B)*) = PI(A;iM B) — Bel(Ai 1B), (44) 
from the equality (44), one gets 
Bel(A; NB) = PU(A; B) — U((AiN B)’). (45) 


Putting the expression of Bel(A; ™ B) above into (43) and 
rearranging terms, one gets 


Bel(B) + A(U) = Bel(A; B)+ PUA;NB), — (46) 


where A(U) £ U((A;N B)*) — U(A* NB), and A(U) € [0,1] 
because of Lemma 1. 
Assuming Bel(B) > 0, and dividing left and right sides of the 
equality (46) by Bel(B) + A(U), one gets 
_ Bel(A;N B) PUAiN B) 
~ Bel(B)+A(U) ~~ Bel(B)+ A(U)' 
Hence, the equality (47) suggests to define the conditional belief 
Bel(A;|B) and Pl(A;|B) as follows 
Bel(A;|B) = Bel(A; M B)/(Bel(B) + A(U)), 
Pl(A;|B) = Pl(AiN B)/(Bel(B) + A(U)). 


(47) 


(48) 
(49) 


Using equality (46), the previous conditioning formulas can be 
rewritten more concisely as 


Bel(A;|B) = Bel(A;N B)/(Bel(A;M B) + PU(A; B)), (50) 
PlA;|B) = Pl(A;N B)/(Bel(Ai;N B) + PU(A: NB)). (51) 


Replacing A; by A; in notations of formulas (49)-(51) we get® 
the following expressions for conditional plausibility Pl(A;|B) 


rene PI(AiN B) 
PUNE) = Bel(B) + U((AiN B)*) —U(A* NB)’ ma 
PI(Ai|B) = een) (53) 


~ Bel(A;N B) + PU(AiN B)’ 


Formulas (50) and (53) coincide with FH formulas [4] originally 
proposed from a very good intuition. In this work, we derive 
them only from TBT by a direct constructive manner. Note that 
Bel(A;|B) given in (48) satisfies Bel(@|B) = 0, Bel(Q|B) = 1, 


It is worth to note that one has always U(A* ™ B) 
DIXEF 4x (m)|XEFR(m) m(X) = U(A* M B) because F4«(m) 
Fe(m)—Fa,(m)—F 4, (m) = Fe(m)—F 4, (m)—Fa, (m) = Fx (m). 


and Bel(A;|B) € [0, 1] conditions. To prove that Bel(A;|B) defined 
by (50) is a belief function one must also prove that it is an n- 
monotone (n > 2) Choquet’s capacity [24] on the finite set ©, or 
equivalently that the condition B3 of Theorem 1 holds for Bel(-|B). 
The proof of B3 is difficult, but three different proofs have been 
already given by Fagin and Halpern [3], Jaffray [6], and Sundberg 
and Wagner [7], the latter one being the clearest of fashion. 


B. Generalization of Bayes’ Theorem 

Starting from (48) with A(U) & U((Ai nN B)*) —U(A* MB) and 
replacing Bel(B) by the expression (37) of TBT, we get 
Bel(A;N B) 


ae 
a= ae Bel(AiN B) + U((AiN B)") — 


Bel(A;|B) = 


Similarly, in assuming Bel(A;) > 0, Fagin-Halpern expression of 
Bel(B|A;) given by 


Bel(BN A;) 


Bel(B\A;) = BelBn Ay) + PUBAA,)’ (55) 
is equivalent to the formula 
oe cco rr 7 
Bel(Ai) + U((BNM Ai) ) — U(B* N Ai) 
where 
U((Bn Ai)’) & PUB Nn Ai) — Bel(BN A:) (57) 
= Ss m(X), (58) 


X€F (aq a,)*(™) 
with F(gna,)«(m) = Fo(m)—Faaa,(m)—Fgua,(m), and where 
XEF Rx (m)|XEF 4, (m) 
with Fzx(m) = Fe(m) — Fa(m) — Fz(m). 
From (56), one obtains 


Bel(AiNB) = Bel(B|A;)[Bel(Ai)+U((B 0 Ai)*)-U(B*NAi)]. 


U(B* Aj) 4 m(X), (59) 


Replacing the above expression of Bel(A;M B) into the formula 
(54), we obtain the formula 


Bel(B|Ai)q(Ai, B) 


Bel(AlB) = 5B a(B|Aia( As, B) + U(Ain By) 


, (60) 


where the factor g(A:, B) introduced here for notation conciseness 
is defined by 

q(Ai, B) = Bel(Ai) + U((BN Ai)*) —U(B* Ai). = (61) 
This allows to establish the Generalized Bayes’ Theorem (GBT). 
Generalized Bayes’ Theorem (GBT): For any_ partition 
{A1,..., Ax} of a FoD ©, any belief function Bel(-) : 2° + [0, 1], 
and any subset B of © with Bel(B) > 0, then one has for 
ie {1,...,k} 


fat Aie) — = EE 
O_, Bel(B|Ai)q(Ai, B) + U((AiN B)’) 


» (62) 


U((A;n B)*) & Pl A;N B) — Bel(A; NB), 


U(AIN BY) 2 Deer ag apyetm MX) = 
Pl(A; 9 B) — Bel(A; M B), and where the factor q(Ai, B) 
is defined by (61). 


where 


Lemma 2: GBT reduces to BT if Bel(-) is a Bayesian BF. 
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Proof: See appendix. 


Remark: When A; C ©; and B C ©2 with 0; 4 Oz, we must 
work in the Cartesian product space © = ©; x O2 and the GBT 
formula is similar to (62) in replacing A; by A; x O2, and B by 
©, x B. The application of GBT formula is not easy in general 
because it requires the knowledge of joint BBA m(-) defined over 
2©1*©2 which is rarely known in practice. If the joint BBA m(-) 
can be expressed (or approximated) as a function of two marginal 
BBAs m™1(-) and ma2(-) (assumed to be known) defined respectively 
over OQ, and O2, then GBT formula should become tractable. 


VII. ILLUSTRATIVE EXAMPLE OF GBT 
Consider © = {0;,4 = 1,...,7}, Fo(m) = {X1, X2,... 
and m/(-) given in Table III. 


, Xo} 


X1 = 02 U 43 U 64 U 65 UO7 1 A 
X2q = 014 U 02 U 03 U 44 
X3 = 03 UO5 UI6 

X4 = 04 U 07 


X5 = 02 

Xe = 06 UO7 

X7 = 02 U 43 U 67 
Xg = 0, U04U 
Xg = 46 


Table III 
FOCAL ELEMENTS AND THEIR MASSES. 


Consider the partition {A1, A2, A3} of O with Ai = 0,U63U04U 
07, Az = 62U@s5 and A3 = O¢, and the subset B = 04U05U06U07 of 
© having belief Bel(B) = m(X4)+m(X6)+m(Xo) = 0.39. Table 
ITV summarizes the BF values which are needed in the derivations. 


X= B=6,U05 U6 UO7 
X = A, = 0, U03U 04 U 87 
X = Ao = 62 UO5 


Bel(Ai) = 
Bel( Ay) = 0.20 
Bel(A3) = 0.05 


B= 064U 06 U 07 
B= 64U 45 U 67 
B=06,U63 


Table IV 
BELIEF AND PLAUSIBILITY VALUES USED FOR THE DERIVATIONS. 


e Results with Fagin-Halpern conditioning formulas 


Using (50) and (55) and the fact that Pl(A;|B) = 1— Bel(Aj;|B) 
and Pl(B|A;) = 1— Bel(B|A;), we get the values of Tables V—VI. 


Ay 
Pl(A2|B) ~ 0.0930 
Pl(A3|B) ~ 0.9298 


Bel(A9|B) = = 0 
Bel(A3|B) = 0.0625 
Table V 


PI(BIAr 
PU(B|A2) 
Pl(B|A3) 


B 
Bel(B|Ay) = =f, 
Bel(B|A3) = 1 
Table VI 
Bel(B|A;) AND PI(B|A;) WITH FAGIN-HALPERN CONDITIONING. 


To verify GBT, one calculates Bel(A;), U((BMAi)*) and 
U(B* 1 A;) for getting q(A:, B), and U((AiM B)"). These val- 
ues are given in Table VII. g(Ai,B) = 0.45 is calculated by 
q(A1, B) = Bel(A1)+U((B A A1)*)—U(B*NA1) = 0.45 because 
Bel(A1) = 0.04, U((B M A1)*) = PUB al Ai) = Bel(B al Ai) = 
0.41 and U(B* 9 Ai) = LXE Fa, (m)|XEF as (m) m(X) = 0. 
U((Ai Mn B)*) = 0.49 is calculated by U((A1N B)*) = PUAN 
B) — Be(AiN B) = 0.54 — 0.05 = 0.49, and other values of Table 
VII are calculated similarly. 


aA; BY | UA BY) 


Table VII 
VALUES OF q(A;, B) AND U((A; M B)*) FOR GBT FORMULA. 


One verifies that GBT formula (62) works because we retrieve 
correct values obtained with FH formula. Indeed, one has 


Bel(B|A1)q(Ai, B) 
y73_, Bel(B|Ai)q(Ai, B) + U((A1N B)*) 
- 0.0889 - 0.45 
™ (0.0889 - 0.45) + (0 - 0.43) + 
= 0.0690. 


Bel(Ai|B) = 
(1 - 0.05) + 0.49 


Similarly, one can easily verify that one obtains Bel(A2|B) = 0 and 
Bel(A3|B) © 0.0625 with GBT. 


e Results with Shafer’s conditioning formulas 


With formulas (22)-(23), we get the values of Tables VIII-IX. 


PI y= 
Pl(Aa|B) = = 0.0500 
Pl(A3|B) = 0.6625 


1B = 
Bel(Ao|B) = =0 
Bel(A3|B) = 0.0625 
Table VIII 
Bel(A;|B) AND PI(A;|B) WITH SHAFER’S CONDITIONING. 


PUB|Ai) 
UB 


B ) 
PI(B|Ag) ~ 0.0870 
PlU(B\A3) =1 


B|A 4533 
Bel(B| Ag) = = 0.0652 
Bel(B|A3) =1 
Table IX 
Bel(B|A;) AND Pl(B|A;) WITH SHAFER’ S CONDITIONING. 


One sees that the conditional values are not coherent since they 
do not verify GBT because we obtain in this example 


Bel(Ai|B) = 0.3250 (using (23)) 
4 Bel(B|A1)q(A1, B) 
S73_, Bel(B|Ai)q(Ai, B) + U((A1 9 B)*) 
0.4533 - 0.45 
™ (0.4533 - 0.45) + (0.0652 - 0.43) 


= 0.2642. 


+ (1- 0.05) + 0.49 


Similarly, one can show that Bel(A2|B) = 0 (using (23)) 4 
0.0405 (using GBT) and Bel(A3|B) = 0.0625 (using (23)) 4 
0.0504 (using GBT). Hence, Ellsberg urn example and this example 
show clearly that Dempster’s rule of combination used by Shafer to 
establish his belief conditioning formulas does not provide coherent 
and satisfactory results since they are inconsistent with lower and 
upper bounds of imprecise conditional probabilities and they do not 
satisfy GBT established directly by a constructive manner without 


ad-hoc assumption. 
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IX. CONCLUSION 


This paper has presented new important results: the Total Belief 
Theorem (TBT), the justification of Fagin-Halpern conditioning from 
TBT, and the Generalized Bayes’ Theorem (GBT). Our theoretical 
results allowed us to establish rigorously the Generalized Bayes’ 
Theorem by a direct constructive manner from TBT. It does not 
need extra assumptions nor some underlying ad-hoc construction 
principles. Also, we prove that our TBT and GBT are fully consistent 
with classical TPT and Bayes Theorem as soon as the belief functions 
are Bayesian. That way this achievement could be an excellent 
ground for working in belief function framework. From Ellsberg’s 
urn example and an illustrative example we have shown that Shafer’s 
conditioning based on Dempster’s rule provides results inconsistent 
with lower and upper bounds of imprecise conditional probabilities, 
and inconsistent with GBT. These new results should allow to 
reconcile practitioners of Bayesian reasoning with those of evidential 
reasoning. 


APPENDIX 
A. Proof of TBT 


Bel(B) 


I 
M 


XEFe(m)|XCB 


- > 


XEF 4, (m)|XEFR(m) 


+ 2 


X€F a, (m)|X€Fp(m) 

X€F4«(m)|X€FB(m) 

= Bel(AiN B)+...+ Bel(A;y 1 B) 

+ Ds m(X) 
XEF 4x (m)|X€FB(m) 

S> Bel(A; MB) + U(A* NB), 


i=1,...,k 


where U(A* 9 B) £ 


m(X) 


xeF ys (m)|X€Fp(m) m(X). 


B. Proof of Lemma 1 
For notation convenience, we denote 
A(U) £ U((A; 9 B)*) — U(A* NB) 
= [PI(A;N B) — Bel(A; N B)] 
— [Bel(A; N B) + Bel(A; N B) — Bel(B)] 
= PI(A; B) — Bel(A; MB) + Bel(B) 
— Bel(A;N B) — Bel(A;O B). 
To prove that A(U) > 0, one needs to prove equivalently that 
PUA: B)—Bel(A;NB)+Bel(B) > Bel(AiNB)+Bel(AiNB). 
Using TBT, one has Bel(B) = Bel(A;N.B)+Bel(AinB)+U(A*n 


B), and replacing expression of Bel(B) in the previous inequality, 
one must verify if the following equality is satisfied 


PIU(A:NB)-—Bel(AiNB)+ Bel(AiNB)+Bel(AinB)+U(A*nB) 
> Bel(A;N B) + Bel(Ain B). 
After simplification, we have to check if inequality below holds 
PU(A; NB) +U(A* 7 B) > Bel(Ain B). 


Because PI(A; NB) = Bel(A; N B) + U((AiM B)’), one has to 
check if Bel(A;xA B)+U((AiM B)*)+U(A* NB) > Bel(AiNB). 
After simplification (omitting both Bel(A; M B) in left and right 


side of the previous inequality), one just has to prove the inequality 
U((Ain B)*) + U(A* MB) > 0 in order to prove that A(U) > 0. 
Because U((A; B)*) € [0,1] and U(A*NB) € (0, 1], the previous 
inequality always holds which proves that U((Ai M B)*) —U(A* 
B) > 0. Moreover because U(A* 9 B) € [0,1], then —U(A* 
B) € [=1,0], and because U((AiN ay ) € [0, 1] one deduces that 
A(U) = U((A; N B)*) —U(A* NB) <1. 


C. Proof of Lemma 2 


If Bel(-) : 2° ++ [0,1] is a Bayesian belief function, then all 
focal elements of its corresponding BBA m(-) are singletons of 
2°. In this case a ) and Pl(-) functions coincide and therefore 
one has U((Ai B)’) = PI(A; NB) — Bel(A;n B) = 0 and 
U((BN Ai)*) = PUBNA: )—Bel(BNA;) = 0. Any focal element 
(singleton) of m/(-) is either a subset of B or a subset of B of the 
FoD ©. Therefore, Fg (m) = 0, which implies U(B*N.A;) = 0, so 
that g(A;, B) = Bel(A;). The GBT formula (62) with in this case 
q(Ai, B) = Bel(A;) and U((Ai N B)*) = 0 reduces to the formula 
Bel(A;|B) =  Bel(B\A;)Bel(A ay Bel(B|A;)Bel(Ai). 
This coincides with formula (10) since Bel(-) (being a Bayesian 
belief function) is homogeneous to a probability measure P(-). This 
completes the proof that GBT formula is consistent with Bayes’ 
Theorem formula when the Belief function is Bayesian. 
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Abstract—In this paper new theoretical results for reasoning 
with belief functions are obtained and discussed. After a judicious 
decomposition of the set of focal elements of a belief function, 
we establish the Total Belief Theorem (TBT) which is the direct 
generalization of the Total Probability Theorem when working in 
the framework of belief functions. The TBT is also generalized 
for dealing with different frames of discernments thanks to 
Cartesian product space. From TBT, we can derive and define 
formally the expressions of conditional belief functions which are 
consistent with the bounds of imprecise conditional probability. 
This work provides a direct establishment and solid justification 
of Fagin-Halpern belief conditioning formulas. The well-known 
Bayes’ Theorem of Probability Theory is then generalized in 
the framework of belief functions and we illustrate it with an 
example at the end of this paper. 


Keywords: Total Belief Theorem (TBT), conditional belief 
functions. 


I. INTRODUCTION 


In this paper, we present new theoretical results for rea- 
soning with belief functions (BF) introduced by Shafer in [1], 
known as Dempster-Shafer Theory (DST) in the literature. The 
first result is the establishment of the Total Belief Theorem 
(TBT) which can be interpreted as a generalization of the Total 
Probability Theorem (TPT) for the belief functions framework. 
TBT is essential for formally establishing conditional belief 
functions in a constructive manner whose expressions are 
consistent with original Dempster’s idea (through eq. (4.8) in 
[2]), rediscovered independently and popularized by Fagin- 
Halpern in [3], [4]. TBT also allows us to present a new 
formulation of Generalized Bayes’ Theorem (GBT). 

Several methods have been proposed in the literature to 
address the belief conditioning problem. They essentially can 
be separated in two different approaches: 1) Shafer’s belief 
conditioning method based on Dempster’s rule of combination 
[1], and 2) the belief conditioning method consistent with 
imprecise probability calculus bounds [2], [3], [5]-[8] based 
on the lower and upper probability interpretation of belief 
functions. 

Although Shafer’s belief functions offer an appealing math- 
ematical framework for modeling epistemic uncertainty, their 
use and the validity of the results obtained in the applications 
are very controversial both for uncertain information fusion 
as well as for belief conditioning mainly due to Shafer’s 
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choice of Dempster’s rule of combination as a pillar for 
combining evidences represented by belief functions and for 
conditioning. These well known problems of DST have already 
been reported and discussed by many experts in the fields over 
the last decades, see for example [9]-[23]. That is why in this 
paper we focus on the second approach of belief conditioning 
based on the lower and upper probability interpretation of BF. 

It is worth noting that Smets in nineties [25] did pro- 
pose a preliminary version of GBT to generalize Bayes’ 
Theorem (BT) to belief functions but Smets’ GBT is based 
on conditional embedding, conjunctive merging and Shafer’s 
conditioning which make it quite complicate to apply and 
whose results have been cast in doubt in [26]. Here we propose 
a simpler and direct constructive manner to derive a new 
version of GBT without need of extra assumptions of some 
underlying ad-hoc principles as done by Smets. Of course, 
we prove that our TBT and GBT presented in this work are 
fully consistent with classical TPT and BT as soon as the 
belief functions are restricted to Bayesian belief functions (i.e. 
classical probability measures). 

This paper is organized as follows. After a brief recall of 
basics of belief functions in Section II and Total Probability 
Theorem in Section III, we present probability conditioning 
and Bayes’ theorem in Section IV followed by classical 
Shafer’s and Fagin-Halpern’s belief conditioning methods re- 
spectively in Sections V and VI. In Section VII, we present 
the decomposition of the set of focal elements of any basic 
belief assignment that allows us to establish formally the 
Total Belief Theorem and its generalization on Cartesian 
product space. The Section VIII presents and justifies the 
new belief conditioning formulas drawn from TBT which 
are fully consistent with Fagin-Halpern conditioning formulas. 
Section IX presents the generalization of Bayes’ theorem in 
the framework of belief functions obtained from TBT. We 
illustrate our new theoretical results with a quite simple GBT 
example in Section X to show how to make derivations of GBT 
and to prove that Shafer’s conditioning results are inconsistent 
with GBT. Section XI concludes this paper. 


II. BASICS OF BELIEF FUNCTIONS 


Belief functions (BF) have been introduced by Shafer in [1] 
to model epistemic uncertainty based on preliminary works 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


done by Dempster [2], [27]. Shafer’s Theory of Belief Func- 
tions is also referred as Dempster-Shafer Theory (DST) in the 
literature. We assume that the answer! of the problem under 
concern belongs to a known (or given) finite discrete frame 
of discernement (FoD) O = {61,62,...,@n}, with n > 1, 
and where all elements of © are exhaustive and exclusive’. 
The set of all subsets of © (including empty set 0, and ©) is 
the power-set of © denoted by 2°. The number of elements 
(i.e. the cardinality) of 2° is 2|°l. A basic belief assignment 
(BBA) associated with a given source of evidence is defined 
as the mapping m/(-) : 2° — [0,1] satisfying the conditions 
m(0) = 0 and $3 4¢50 m(A) = 1. The quantity m(A) is 
called the mass of A committed by the source of evidence. 
Belief and plausibility functions are respectively defined by 


Bel(A) = S> m(X), (1) 
xe2° 
XCA 
PI(A)= S> m(X)=1-Bel(A). (2) 
xe2° 
XNAAO 
where? A = © — {A} = {X|X € © and X ¢ A}, ie. 


A is the complement of A in ©. The notation = means 
equal by definition. The width Pl(A) — Bel(A) of the belief 
interval [Bel(A), P1(A)] is usually called the uncertainty on A 
committed by the source of evidence, and will be denoted* by 
U(A*). It represents in fact the imprecision on the probability 
of A granted by the source of evidence, which provides the 
BBA m(-). 

A focal element X of a BBA m/(-) is an element of 2° 
such that m(X) > 0. Note that the empty set @) is not a focal 
element of a BBA because m()) = 0 (close-world assumption 
of Shafer’s model for the FoD). The set of all focal elements 
of m(-) is denoted 


Fe(m) = {X C O|m(X) > 0} 
= {X € 2®|m(X) > 0}. (3) 


Because m(Q) = 0, one always has 1 < |Fe(m)| < 2!°! — 1. 
The set of focal elements of m(-) included in a subset A of 
© is denoted 


Fa(m) £ {X C AC O|m(X) > 0} (4) 


Note that if A C B C 0, then F4(m) C Fp(m), and one 
always has? F4(m)M¥Fe(m) = Fang(m) for any subsets A 
and B of 0, but Faus(m) 4 Fa(m) U Fg(m) in general®. 


lie. the solution, or the decision to take. 

?This is so-called Shafer’s model of FoD [28]. 

3Here the minus symbol denotes the set difference operator [29], [30]. 

“Tn the literature it is usually denoted by U(A). Here we use a new notation 
U(A*) which is not anecdotic. This new notation reveals its importance for 
the consistency of notations used in formulas we give in this paper. 

5Proof: Fa4(m)NFpB(m) = {X € Fe(m)|(XNA)A(XNB) = 
{X € Fe(m)|X (AN B) = X} = Fanp(m). 

®For example, consider the focal elements given in the example of section 
X. One has Ay UB = {61, 03, 64, 07} U{O1, 02,03} = {01, 02, 03, 04, O7 } 
and therefore Fa,uB = {X2, X4, X5, X7}, but F4, = {X4} and Fz = 
{X5}, so that Fa, UFR = {X4,X5} F Fay,uB: 


X}= 


By definition, all elements of 2° not in Fo(m) have a zero 
mass value, and therefore the definition of Bel(A) and PI(A) 
given in (1)-(2) can also be expressed’ 


Bel(A)= Yo m(X)= SO m(X), ©) 
X€Fe(m) X€EFa(m) 
XCA 

PI(A)= S> m(X)=1-Bel(A). (7) 
X€EFe(m) 
XNAAO 


The set of focal elements Fo(m) of the BBA m/(-) can always 
been partitioned as {F4(m), F4(m), Fa~(m)} where 


Fa+(m) = Fe(m) — Fa(m) — Fa(m) (8) 
= {X € Fo(m)|X NAV and XN AFD}, (9) 


represents the set of focal elements of m/(-) which are not 
subsets of A and not subsets of A = 0 — {A}. 


The uncertainty U(A*) can also be expressed directly as 


U(A*) = m(X). (10) 
XEF,4«(m) 
It is worth noting that U(A*) = PI(A) — Bel(A) = (1 —- 
Bel(A)) — (1 — PI(A)) = PI(A) — Bel(A) = U(A*), o 
equivalently 
ua) = SY m(X), (11) 
X EF qax(m) 
where F4-(m) = Fe(m) — Fq(m) — Fa(m) = Fax(m). 


When all elements of Foe(m) are only singletons, m/(-) 
is called a Bayesian BBA [1] and its corresponding Bel(-) 
and Pl(-) functions are homogeneous to a same (subjective) 
probability measure P(-). 

The class of belief functions can be characterized without 
explicitly referencing to a BBA, see Shafers’ theorem in [1] 
page 39, with its proof on page 51. More precisely, a mapping 
Bel(-) : 2° ++ [0,1] is a belief function if and only if 
Bel() = 0, Bel(O) = 1 and for every positive integer n 
and every collection Aj,..., Ap of subsets of O 


UAn)> So (-1 


Bel(A, VU... )WI*" Bel( [1 Ai). (12) 


There is a one-to-one relationship between a BBA m/(-) and 
its corresponding belief function Bel(-). The BBA m/(-) that 
produces a given belief function is unique and is obtained for 
any A C O by the following Mobius inverse formula (see [1], 


p-39) 
m(A)= > (-1)'47?!Bel(B). (13) 
BCACO 
7More precisely, we should write Bel(A) = 0+ xe Fa(m) ™ m(X) to 


get a well defined value even there is no X € Fo(m) such that X C A. For 
notation convenience, this zero additional term (as well other zero terms in 
formulas (10)-(11), (42), etc) will be omitted in the sequel being understood 
that a sum of non existing terms is always equal to zero. 
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In DST framework, Shafer [1] did propose to combine 
s > 2 distinct sources of evidence represented by BBAs 
my,(.),...,7s(.) over the same FoD with Dempster’s rule 
(i.e. the normalized conjunctive rule). Discussions on the 
justification of Dempster’s rule with examples can be found 
in [21]-[23]. 


III. TOTAL PROBABILITY THEOREM (TPT) 


We recall briefly the Total Probability Theorem because 
we will present its extension in Belief function framework. 
In probability theory, the elements 9; of the space © are 
experimental outcomes. The subsets of © are called events 
and the event {6;} consisting of the single element 0; is 
an elementary event. The space © is called the sure event 
and the empty set ) is the impossible event. We assign to 
each event A a number P(A) in [0,1], called the probability 
of A, which satisfies the three Kolmogorov’s conditions: 1) 
P(0) = 0; 2) P(@) = 1; and 3) if AN B = {0}, then 
P(AUB) = P(A)+P(B). These conditions are the axioms of 
the theory of probability [30], [31]. The fundamental Theorem 
of the probability theory is the following Total Probability 
Theorem (TPT), also called a the law of total probability, see 
[31] and Theorem 1B of [32]. 


Total Probability Theorem (TPT): Consider an event B and 
any partition® {A;,i=1,...,k} of the space ©, then 


P(B) = P(BN A1)+ P(BN Az) +...+P(BN Az). (14) 


IV. CONDITIONAL PROBABILITY AND BAYES’ FORMULA 
Starting from TPT formula (14) and assuming P(B) > 0, we get 
for any 7 € {1,...,k} after dividing each side of (14) by P(B) and 
rearranging terms the equality 
P(AiN B) 
P(B) 


P(A; MB) P(A; NB) 


ato 2 Py 1 PB 


(15) 


which allows us to define the conditional probability P(A;|B) by? 


P(Ai|B) £ P(A; B)/P(B). (16) 


Similarly, by considering an event A; of © and the partition {B, B} 
of ©, the TPT formula P(A;) = P(AiN B) + P(Ai NM B) applies, 
and by dividing it by P(A;) (assuming P(A;) > 0), we get 

P(AiN B)/P(Ai) =1-— P(Ain B)/P(Ai), (17) 


which allows us to define also the conditional probability P(B|A;) 
by 


P(B|A;) = P(AiN B)/P(A;). (18) 
From (16) and (18), one deduces the equality 
P(A; NB) = P(A;|B)P(B) = P(B\A;i)P(Ai). (19) 


From (19) and assuming P(B)>0 we get 
P(A;|B) = P(B|A;)P(Ai)/P(B), and assuming P(A;) > 0 
we get P(B|A;) = P(A;|B)P(B)/P(Ai). 


8A partition of a set © is a collection of mutually exclusive subsets of © 
whose union equals ©. 

°In probability theory, the notation P(A;, B) = P(A; M B) is also used 
to represent the probability of the joint occurence (intersection) of events A; 
and B. 


Using TPT formula (14) and noting that P(A; N B) = 
P(B\A;)P(Ai), we get 
k 
P(B) = 5— P(B|A:) P(A). 
i=1 
Substituting (20) in P(A;|B) = P(B|A;)P(A;:)/P(B), we get the 
well-known Bayes’ Theorem formula (BTF) 


(20) 


k 
P(A;|B) = P(B\A:)P(Ai)/ > P(BIA:)P(Ai)- 
i=l 
It can be easily verified that the conditional probability defined 
by (16) verifies the three axioms of the Theory of probability [31]: 
1) P(O|B) = 0, 2) P(@|B) = 1 and 3) if Ay MN Az = O, then 
P(A, U A2|B) = P(Ai|B) + P(A2|B). 


(21) 


In the previous presentation, A; (i = 1,...,k) and B are events 
(subsets) of the same space ©. How to proceed to compute P(A;|B) 
if the events A; (¢ = 1,...,k) and B are subsets of different 


spaces, say if A; C O01 = {a1,...,Um} = {tp,p = 1,2,...,m} 
(¢@ = 1,...,k), and if BC Oo = {yi,...,yn} = {Yq = 
1,2,...,n} with 0; 4 ©O2? Such situation corresponds to a so- 
called combined experiment [31]. In fact, one can prove that similar 
conditioning formulas can also be established. For this, we need 
to work with the Cartesian product space O £ ©, x ©. whose 
elementary elements are all the ordered pairs (xp, yq) with x» € O1 
and yq € Oz. The two experiments are viewed as a single combined 
one whose outcomes are pairs (Xp, yq). In this space O = O; x Oa, 
Zp is not an elementary element but a subset of n elements of 
O, ie. {ap} = {(xp,y1),---, (Lp, yn) }. Similarly, yg is not an 
elementary element but a subset of m elements of O, ie. {yg} = 
{(@1,Yq),---;(@m, Yq)}. If Ai C O1 and B C On, then A; x B= 
{(Xp, Yq)|up € As Yq € B} C O. If one forms A; x Oz and QO; x B 
one sees that A;x B = (A;xO2)N(O1x B) = (01x B)N(A;i x 2). 
Because the event A; x ©2 occurs in the combined experiment if the 
event A; of the experiment 1 occurs no matter what the outcome 
of experiment 2, one has P(A; x O2) = Pi(A;) where P,(A;) 
is the probability of event A; in the experiment 1. Similarly, the 
event 0; x B occurs if B occurs in experiment 2 no matter what the 
outcome of experiment 1, so that P(O1 x B) = P2(B) where P2(B) 
is the probability of event B in the experiment 2. One considers a 
partition {A;, A2,..., Ax} of ©O1 and a subset (event) B C Ox. 
Based on set theory and property of Cartesian product, one has 


(or x B=(0O1 x B)N(O1 x G2) 
= (0, x B)N((A1 UA2U...U Ax) X O2) 
= (0; x B)N ((A1 xX O2)U...U (Ax x O2)) 
= U;((O1 x B) Ch (A; x ©2)). 
The elements A; x O2,i=1,...,k being disjoint’®, one has the 
following TPT formula 
P(O, x B) = P(U;((O1 x B) al (A; x ©2))) 
= P((O1 x B)N (Ai x @2))+ 
.. + P((O1 x B)N (Ax x Q2)). (22) 


After dividing each side of formula (22) by P(©1 x B) (assumed 
positive) and rearranging terms, we get 


P((Ay x O2) MN (O1 x B)) 


P(©, x B) 
P((A; x 02) M (O1 x B)) 
1 —ETEoTNoooDoem=as|,_— (2 
; De P(©, x B) e) 
By hal Doe k 
j#i 
because A; are disjoint since {Ai1,...,A,} is a partition of 4. 
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Formula (23) suggests naturally to define the conditional proba- 
bility P(A; x ©2|01 x B) by 


P(A; x @2|01 x B) £ P(A; x B)/P(@1 x B). (24) 


Using same reasoning as before and working on Cartesian product 
space O = ©; X Oz, one can also prove"! that if P(A; x 02) > 0 
one can define 


P(©,1 x BJA; x @2) = P(A; x B)/P(Ai x O82). (25) 
From (24) and (25), one deduces the equality 
P(A; x 02/01 x B)P(@, x B) = 
P(Q1 x BIA; x O2)P(A; X O2). (26) 
From equality (26) and assuming P(O; x B) > 0, we get 
P(A; x 02/01 x B) = 
P(Q1 x BIA; x O2)P(A;i x O2)/P(O1 x B). (27) 
From equality (26) and assuming P(A; x O2) > 0 we get 
P(Q, x B\Aj x O02) = 
P(A; x ©2|O1 x B)P(©1 x B)/P(A; x 92). (28) 


Using TPT formula (22) and formula(25), we get P(Q1 x B) = 
yok, P(Q1 x BJA; x @2)P(Ai x 2). Putting this expression in 
(27), we obtain the Bayes’ Theorem formula (BTF) when A C ©; 
and B C ©2 and 0; 4 Oz, which is written as 


P(A; x O2|O1 x B) = 
P(O, x B\A; x O2)P(Ai x O02) 
yo, P(Q1 x BIA; x O2)P(Ai x G2) 
For notation convenience, we can use classical formulas when 
working with different sets of experimental outcomes ©; and O2 


with keeping in mind that in this case A; must be understood as 
A; X ©O2 and B as ©, x B. 


(29) 


V. SHAFER’ S CONDITIONING 


In the belief functions framework, Shafer did propose formulas 
to calculate conditional belief functions Bel(A|B) and PIl(A|B). 
Shafer’s formulas have been obtained from the conditional BBA 
m/(-|B) obtained from Dempster’s rule of combination of the original 
BBA m(-) with the BBA mgp(B) = 1 focused on B under the 
condition that Bel(B) < 1, or equivalently’? under the condition that 
PI(B) > 0. Shafer’s conditioning formulas for belief and plausibility 
functions were established by Shafer in Theorem 3.6 p. 66 of [1]. 
For A, B C © with PI(B) > 0, Bel(A|B) and PI(A|B) are given 
by 


Bel(A|B) = (Bel(AU B) — Bel(B))/(1— Bel(B)), (30) 
PI(A|B) = PAN B)/PUB). (31) 

The expression (30) of Bel(A|B) is equivalent to 
Bel(A|B) = (PUB) — PI(BN A))/PI(B), (32) 


because one has always (from definition of belief functions) P/(B) = 


1 — Bel(B), and the numerator of (30) can be written as 
Bel(AU B) — Bel(B) = (1 — Bel(B)) — (1 — Bel(AU B)) 
= Pl(B) — Pl(AUB) 
= Pl(B) — PI(BN A). 


|The proof is left to the reader due to space limitation restraint. 
Indeed, if Bel(B) < 1 then Pl(B) = 1— Bel(B) is greater than zero. 


Using (30)-(31) and taking A = 0, we get Bel(0|B) = PI(O|B) = 
0, and taking A = © we get Bel(Q|B) = PI(O|B) = 1. Also 
in taking B = © we get Bel(A|O) = Bel(A) and PI(A|O) = 
PI(A). Note that taking B = A in (31)-(32), we obtain Bel(A|A) = 
PI(A|A) = 1 which fits with the common sense. 

In reversing the roles played by A and B and switching the 
notations in previous expressions, the following formulas also hold 
(assuming P/(A) > 0) 


Bel(B|A) = (PIA) — Pl(An B)/PI(A), (33) 


Pl(B|A) = Pl(BN A)PI(A). (34) 


From (31) and (34), one deduces PI(AM B) = PI(A|B)PI(B) = 
Pl(B|A)PI(A). Hence, the following formula applies for conditional 
plausibilities when Pl(B) > 0 


Pl(A|B) = Pl(B|A)PUA)/PUB). (35) 


Note that this formula for conditional plausibilities is similar to the 
expression for conditional probabilities given in (16) when replacing 
plausibilities by probabilities. 


The main drawback of Shafer’s conditioning is its incompatibil- 
ity with probability calculus when working with imprecise prob- 
abilities. More precisely, the bounds of belief interval defined by 
[Bel(A|B), Pl(A|B)] obtained by (30)-(31) are in general'* incom- 
patible with lower and upper bounds of the conditional probability 
P(A|B). This problem makes Shafer’s conditioning very disputable 
and cast serious doubts on pertinence (validity) of Shafer’s condition- 
ing results when used in applications, which is a direct consequence 
of the validity of Dempster’s rule reported in [3], [9]-[23], [33], 
[34]. Shafer's conditioning problem has already been reported and 
addressed by several authors [3], [6], [7], [14], [24] in the past 
with some examples. To easily show this incompatibility of Shafer’s 
conditioning with probability calculus we present briefly the famous 
Ellsberg’s urn example [35]. 


Example 1 (Elisberg’s urn): We consider an urn with red (R), black 
(B) and yellow (Y) balls. The a priori information one has on the 
repartition of the balls in the um is the following: 1/3 of balls are 
red balls and 2/3 or balls are black and yellow balls. We don’t know 
precisely the percentage of black balls, nor the percentage of yellow 
balls. So the a priori information about the chance to pick a ball in the 
urn can be represented by a (parametric) probability mass function 
P(-) with P(R) = 1/3, P(B) = 2/3-—2, P(Y) =, where z is an 
unknown number/parameter in [0,2/3], P(R) is the probability to 
pick at random a red ball in the urn, P(B) is the probability to pick 
at random a black ball in the urn, and P(Y) is the probability to pick 
at random a yellow ball in the urn. Of course because x is unknown 
but bounded, P(.B) and P(Y) are unknown but their bounds are 
known. In fact, this problem can be seen as a problem of imprecise 
probabilities where P(R) € [1/3, 1/3], P(B) € [0,2/3], P(Y) € 
[0, 2/3] and with the constraint P(R) + P(B) + P(Y) = 1. Now 
let’s suppose that someone picks a ball at random in the urn and tell 
us that the color of the ball is not black, i.e. the event B = RUY has 
occurred. How do we must revise (update) our prior probabilities with 
this new information? The correct answer to this question is obtained 
by computing the conditional probabilities P(R|B), P(B|B) and 
P(Y|B) and by analyzing their bounds. This is done as follows using 


the fact that P(B) = P(RUY) = P(R)+ P(Y) — P(RNY) = 


3but if the BBA m(-) is Bayesian. 
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P(R) + P(Y) = (1/3) + a. Indeed, P(RMY) = 0 because the 
events R and Y are mutually exclusive. So, we get 


P(R|B) = P(RN(RUY))/P(RUY) 


= P(R)/((1/3) + x) = (1/3)/((1/3) + 2), 
P(B|B) = P(BN(RUY))/P(RUY) 

= P(0)/((1/3) + x) = 0/((1/3) + 2), 
P(Y|B) = P(Y N(RUY))/P(RUY) 


P(Y)/((1/3) + #) = #/((1/3) + 2). 


If « = 0, then P(R|B) = 1 and P(Y|B) = 0. If « = 2/3, then 
P(R|B) = 1/3 and P(Y|B) = 2/3. Therefore after conditioning by 
B= RUY we get as bounds of conditional probabilities values the 
following intervals P(R|B) € [1/3, 1], P(B|B) € [0,0], P(Y|B) € 
[0, 2/3] with the constraint P(R|B) + P(B|B)+ P(Y|B) =1. 

Let’s examine what we get using Shafer’s conditioning ap- 
proach. For this, the problem is modeled directly in the belief 
function framework using the a priori BBA m/(-) defined on the 
FoD O = {R,B,Y} with m(R) = 1/3, m(BUY) = 
2/3 which corresponds to the following a priori belief inter- 
vals [Bel(R), PI(R)]| = [1/3,1/3], [Bel(B), Pl(B)| = [0, 2/3], 
[Bel(Y), PUY)] = (0, 2/3) 

With Shafer’s conditioning formulas and noting that PI(R) = 
1/3, PIB) = 2/3, PI(Y) = 2/3, and PI(RUY) = 1, we get 
incompatible results with the real bounds of conditional probabilities 
because 


[Bel(R|B), Pl(R|B)] = [1/3, 1/3] (by Shafer) 
# {1/3, 1] (correct bounds), 
= [0, 0] (by Shafer) 
= {0, 0] (correct bounds), 

[Bel(Y|B), Pl(Y|B)] = [2/3, 2/3] (by Shafer) 

# |0, 2/3] (correct bounds). 


[Bel(B|B), Pl(B|B)| 


To overcome this problem, Fagin and Halpern did propose a more 
efficient conditioning approach which is, by construction, always 
consistent with conditional probability bounds. It is presented in the 
next section. 


VI. FAGIN-HALPERN CONDITIONING 


Fagin and Halpern (FH) proposed in [3], [4] to define the condi- 
tional belief as the lower envelope (ie. the infimum) of a family 
of conditional probability functions to make belief conditioning 
consistent with imprecise conditional probability calculus. Assuming 
Bel(B) > 0, Fagin and Halpern proposed the following conditional 
formulas (FH formulas for short) 


Bel(A|B) = Bel(AN B)/(Bel(AN B) + Pl(An B)) 
PI(A|B) = Pl(AN B)/(PU(AN B) + Bel(An B)). 


(36) 


(37) 


Fagin and Halpern did prove in [3] with long derivations and great 
effort that the conditional belief Bel(A|B) given by (36) satisfies 
also the three conditions for defining a true belief function according 
to Shafer’s theorem in [1], p. 39. Therefore, the formula (36) is 
also a good candidate and serious alternative for conditioning belief 
functions. However, it is quite mysterious how Fagin and Halpern 
did obtain (construct) these close-form expressions. According to 
the authors, these expressions were rather established from a very 
good intuition. A better justification has been given by Sundberg and 
Wagner in [7] (p. 268) but it is still not so clear in our opinion. 
In this paper, we justify clearly and directly the establishment of 
FH formulas from the simple and direct consequence of the Total 
Belief Theorem (TBT) which is one of the main contributions of 
our work. From FH conditioning formulas (36)-(37), we can verify 
that the common sense results are also obtained, that is Bel(@|B) = 


PI(G|B) = 0, Bel(Q|B) = PI(O|B) = 1, Bel(A|O) = Bel(A), 
PI(A|O) = PI(A), and Bel(A|A) = PI(A|A) = 1. 

FH conditioning formulas are consistent with Bayes conditioning 
formulas when the underlying BBA m/(-) is Bayesian. Indeed if 
m(-) is Bayesian, then PI(AM B) = Bel(AN B) = P(AN B), 
PUAN B) = Bel(AN B) = P(AN B) and PI(BN A) = 
Bel(B MN A) = P(BM A) so that the FH formulas become 
equivalent to Bel(A|B) = P(AN B)/(P(AN B)+ P(AN B)) and 
PI(A|B) = P(AN B)/(P(AN B) + P(AN B)). Thanks to total 
probability theorem (TPT) formula (14), the denominator involved 
in these formula is P(A M B) + P(AN B) = P(B), therefore 
Bel(A|B) = PI(A|B) = P(AN B)/P(B) = P(A|B). 

Similarly, one can also easily verify that Bel(B|A) = Pl(B|A) = 
P(ANB)/P(A) = P(B|A). The advantage of FH conditioning is its 
complete compatibility with the conditional probability calculus [7]. 
Let us show what provides FH conditioning in the previous Ellsberg’s 
um example. 


Ellsberg’s urn example revisited: Let's see the result obtained by 
formulas (63) and (65) for Ellsberg’s urn example. Applying formulas 
(63) and (65) with the conditioning event B = RUY we obtain 


ae Bel(RN(RUY)) 
Bel(RIB) = FeRA (RUY)) + PI(BUY)A(RUY)) 
a 
(1/3) + 2/3) 
_ PURO (RUY)) 
~ Bel((BUY)N(RUY)) + PURM(RUY)) 
1s 
~0+(1/3) 7 
oo Bel(BN(RUY)) 
Bel(B|B) = BBA CRUY)) + PI(RUY)A(RUY)) 
0 


=——_ =0, 


O+1 


PU(R|B) 


_ PUBA(RUY)) 

~ Bel((RUY)A(RUY))+ PUBA(RUY)) 
0 

= Tato” 


Pl(B|B) 


_ Bel(Y N(RUY)) 

~ Bel(YN(RUY))+ PIU(RU B)N(RUY)) 
0 

"oF a)" 
Si. PUY N(RUY)) 

ee a Bel((RUB)N(RUY))+ PUY N(RUY)) 

ee ee 

(1/3) + 2/3) 


Hence with FH conditioning formulas, we get the correct condi- 
tional probability bounds 
[Bel(R|B), Pl(R|B)] = [1/3, 1 
= [1/3,1 
[Bel(.B|B), Pl(B|B)| = [0, 0] (by Fagin-Halpern) 
= [0,0] (correct bounds), 
[Bel(Y|B), PU(Y|B)| = [0, 2/3 
= (0, 2/3 


Bel(Y|B) 


(by Fagin-Halpern) 


(correct bounds), 


(by Fagin-Halpern) 


(correct bounds). 


We can also verify that Bel(@|B) = 0, Bel(RU B|B) = 1/3, 
Bel(RUY|B) = 1, Bel(BUY|B) = 0 and Bel(RUBUY|B) = 1. 
Applying Mobius inverse formula (13) with this conditional belief 
function Bel(-|B), we get the conditional mass of belief given by 
m(R|B) = 1/3 and m(RUY|B) = 2/3 and all other mass 
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values are equal to zero, whereas with Shafers approach based 
on Dempster’s rule of combination we get m(R|B) = 1/3 and 
m(Y|B) = 2/3. We see the difference between Shafer’s and 
FH conditioning approaches. With Shafer’s conditioning approach, 
because (BUY) (RUY) 4 @ the mass m(BUY) = 2/3 is 
entirely transferred (optimistically) to the most specific focal element 
Y included in B = RUY. With the FH conditioning method the 
mass m(BUY) = 2/3 is entirely transferred (pessimistically, or 
cautiously) to the least specific focal element RU Y included in 
B=RUY. 


VII. TOTAL BELIEF THEOREM (TBT) 


In this section, we extend TPT theorem to belief and plausibility 
functions and we establish the Total Belief Theorem (TBT). Before 
this, we need to explain how the set of focal elements of a given BBA 
m/(-) must be decomposed because it is the basis of the establishment 
of TBT. 


A. Decomposition of the set of focal elements Fe(m) 

Let us consider a FoD © = {61,...,)6)} with |O| > 1 elements, 
and a BBA m(-) defined on 2° with a given set of focal elements 
Fe(m). Consider any partition {A1,A2,...,Ax} of the FoD 0, 
then one can always decompose Fo(m) as the union of following 
subsets 


Fo(m) = Fa,(m)U...U Fa, (m) U Fax (m). 


where Fa,(m) (i = 1,...,k) is the set of focal elements of m/(-) 
included in A;, and F4~(m) is the set of focal elements of m/(-) 
which are not included in Aj, 1 = 1,...,k. We use the notation 
A* for representing the entity characterized by the focal set F4«(m) 
mathematically defined by 


Fax(m) = Fo(m) — Fa,(m) —...— Fa, (m). (39) 


The entity A* has in general no explicit form and it is used only for 
notation convenience to make presentation of formulas more concise 
in the sequel. Because A; for i = 1,...,k are mutually exclusive 
(disjoint), the sets F4,(m) are also mutually exclusive and therefore 
Ni=t,...,4(Foe(m) — Fa;(m)) = Fo(m) — Fa, (m)—...—Fa, (m) 
because all possible intersections of focal sets including F4,(m)/M 
Fa,(m) for 7 A i equal the empty set. Hence F4+(m) can also be 
expressed as 


(38) 


Fax(m) = Ni=1,....k (40) 


where Fa,(m) = Fe(m) — Fa,(m) = F3,(m) + Fax(m) 
represents the set of focal elements of m/(-) which are not subsets of 
Aj. 


Example 2: Consider 0 = {61, 62, 03, 04,05} and a BBA m(-) de- 
fined on 2°, with set of focal elements Fo(m) = {X1, Xo,..., Xs} 
with X; = 01, X2 = 0,U02, X3 = 02U03, X4 = 03U04, X5 = Oa, 
Xe = 04U 05, X7 = 0; U 03 UOs and Xg = 65. Consider the 
partition {Ai, Az, As} of © with A; = {61, 02}, Ag = {03,04} 
and As = {05}. In this example, one has 


Putin =10, Mei 0, OL, 

Fay(m) = {Xa, X5} = {03 U 04, Oa}, 

Fa,(m) = {Xs} = {65}, 

Fa«(m) = Fo(m) — Fa,(m) — Fa,(m) — Faz (m) 
= {X3, X6, X7} 


= {02 U 03, 64 U 45, 01 U 03 Us}. 


One sees that 


Fa,(m) = Fo(m) — {X1, X2} = {X3, Xa, Xs, Xe, X7, Xs}, 
Fa,(m) = Fe(m) — {X4, Xs} = {X1, Xo, X3, Xo, X7, Xs}, 
Fas,(m) = Fo(m) — {Xs} = {X1, Xo, X3, X4, Xs, Xo, X7}. 


and applying (40), we get 
Fa, (m) Fa, (m) 1 Fas (m) = {X3,X6, X7} = Fax(m). 


Example 3: Consider 0 = {01,62,63,04,05} and a BBA m(-) 
defined on 2°, with the degenerate set of focal elements with only 
one focal element as follows Fe(m) = {X1 = O} corresponding to 
the vacuous BBA. Consider the partition JA, Az, A3} of © where 
A, + {x3, 25}, A, + {x2} and A3 * {x1,xv4}. Then, we get 
Fa,(m) = 0, Fa, (m) = 0, Fag (m) = @ and Fax (m) = {Xi} — 
0-0-0 = 0. Note that, F4,(m) = F4,(m) = Fa,(m) = 9, 
)N Fa,(m) = O = F4x(m), and 
) U Fag (m) U Fag(m) U Fax(m) = 


and therefore F4,(m 
of course Fe(m) = 
PUDUDUO=O. 


B. Total Belief Theorem (TBT) 


Based on the previous decomposition of the set of focal elements 
Fe(m) according to any given partition {Ai,..., Ax} of the FoD 
©, the following Total Belief Theorem (TBT) is established. 


)N Fay(m 
Fa,(m 


Total Belief Theorem (TBT): Let’s consider a frame of discernment 
© with || > 2 elements and a BBA m(-) defined on 2° with the set 


of focal elements Fo(m). For any chosen partition {Ai,..., Ax} of 
© and for any B C ©, one has 
Bel(B)= 5° Bel(Ain B)+U(A* NB), (41) 
i=1,...,k 
where F4«(m) = Fo(m) — Fa,(m) —...—Fa,,(m) and 
U(A*N B)* D> m(X). (42) 


X€F4«(m)|X€Fp_(m) 


Proof of TBT: See appendix. 


A* is a shorthand notation for the entity associated to the set of 
focal elements F4+(m) of the BBA m(-) involved in the summation 
(42) of U(A*MB). From the formula (42), one sees that U(A*NB) € 
[0, 1]. Note that if B = © and if the FoD O is simply partitioned as 
{A 5 Aj, A = Ap}, then U(A* 9 B) = U(A* NO) = U(A*) = 
PI(A) — Bel(A) = PI(A) — Bel(A). 


If one applies TBT with B = O, we get for any chosen partition 
{A, samy Any of () 


S> Bel(A 
i=1,...,k 


where U(A*) © exe Fyx(m) XX). This equality corresponds to 
TPT if U(A*) = 0 (ie. there is no uncertainty on the value of 
probabilities of A;, i =1,...,k). 


(43) 


Corollary of TBT: If m/(-) is Bayesian, then TBT is consistent with 
the Total Probability Theorem (TPT). 


Proof: See appendix. 


From TBT one can establish the following (not so elegant) Total 
Plausibility Theorem (TPIT). 


Total Plausibility Theorem (TPIT): For any BBA m(-) : 2° +> 
[0,1], and for any partition {A1,..., Ax} of ©, one has for any 
BCO 


PUB)= S> PIA;UB)+1—k-U(A* NB). 


i=1,...,k 


(44) 


Proof: See appendix. 
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Example 4: Consider the FoD O = {61,..., 67} and the set of focal 
elements Fo(m) = {X1, X2,...,X9} of a BBA m(-) defined over 
2° given in Table I. 


Table I 

FOCAL ELEMENTS AND THEIR MASSES. 
Focal element X BBA m(X ) 
X1 = 02 U 03 U 64 U 65 UO7 m(X1) = 0.01 
X2 = 64 U92 U 63 U 04 m(X2) = 0.02 
X3 = 03 U 05 U O6 m(X3) = 0.03 
X4 = 04 U 07 m(Xa4) = 0.04 
X5 = 02 m(X5) = 0.20 
X6 = 06 U7 m(X6) = 0.30 
X7 = 02 U 93 U 7 m(X7) = 0.20 
Xg = 01 U4 UO m(Xg) = 0.15 
X9 = 06 m(Xg9) = 0.05 


Let’s consider the partition {A1, A2, A3} of © with Ay £0, U63U 
0407, Ao & 62U6@s5 and A3 & 06 and the subset B = 04U05U06U07 
of © having positive belief Bel(B) = m(X4)+m(X6)+m(Xo9) = 
0.39. Table II summarizes the belief values of different subsets of © 
which are needed in the derivations to apply TBT. 


Table II 
BELIEF AND PLAUSIBILITY VALUES USED FOR THE DERIVATIONS. 


Subsets of © Bel(- 

B= 04U 65 U 6 U 67 Bel(B) = 0.39 

A, = 01 U 63 U 04 U 07 Bel(Ai) = 0.04 

Ag = 02 U45 Bel(Ag2) = 0.20 

A3 = 6 Bel(A3) = 0.05 
A,NB=64U67 Bel(A, 1 B) = 0.04 
Az2NB=45 Bel(A2N B) =0 
A3N B= % Bel(A3 M B) = 0.05 


In this example, one has 


Fp(m) = {X4, X6, Xo} and Fa(m) = {X5}, 
Fa,(m) = {Xa} and Fy, (m) = {X5, Xo}, 
FAs (m) => {X5} and Lae (m) => {X4, X6, Xs, Xo}, 
Fas (m) = {Xo} and Fig (m) = {X1, X2, Xa, X5, X7}, 
Faz (m) = Fe(m) — Fa,(m) — Fa, (m) — Faz (m) 
= {X1, X2, X3, Xe, X7, Xs}. 
Therefore, 
U(A*N B) = > m(X) = m(X¢) = 0.30. 


XEF ax (m)|XE€FpB(m) 


In applying TBT formula (41), one can easily verify that 


Bel(B) = Bel(BM Ai) + Bel(BN Az) + Bel(BN Az) 
+U(A* NB) 
= 0.04 + 0 + 0.05 + 0.30 = 0.39. 


C. Special case : A partition with only two elements 


If we consider any simple partition {A, A} of the FoD © and any 
B subset of ©, then the TBT and TPIT formulas (41) and (44) reduce 
tol! 

Bel(B) = Bel(AN B) + Bel(AN B) + U(A* NB), (45) 
Pl(B) = PAU B)+ PUAUB)—1-—U(A* NB). (46) 


MTake k = 2, and set A * A; and A = Az in (41) and (44). 


Remark: If the BBA m/(-) is Bayesian then U(A* 1 B) = 0 and 
U(A* 9 B) = 0. Therefore the previous formulas reduce to 


Bel(B) = Bel(AN B) + Bel(AN B), (47) 


PIB) = Pl(AUB) + PI(AUB) -1. (48) 


m/(-) being a Bayesian BBA, Bel(-) and Pl(-) are homogeneous to 
a same (possibly subjective) probability measure P(.). Therefore, the 
previous equalities can be rewritten as 


P(B) = P(AN B)+ P(AnNB), (49) 


P(B) = P(AUB)+P(AUB)-1. (50) 


The formula (49) is valid because {A, A} is a partition of © and 
because of TPT theorem. The formula (50) is nothing but a dual form 
of TPT formula. It is also valid because 

P(AUB) + P(AUB) —1= P(A)+ P(B) — P(ANB) 
+ P(A)+ P(B)-— P(ANB)-1 
= (P(A) + P(A) - 1) + 2P(B) 
— (P(AN B)+ P(ANB)) 
= 0+ 2P(B) — P(B) = P(B) 


D. Generalization of TBT 


Previously, the TBT formula was established when the partition 
{Ai,...,Ax} was related to a given FoD © and B was a sub- 
set of the same FoD ©. We can generalize TBT in considering 
{Ai,..., Ax} as any partition of a FoD 0; = {x1,...,%m} = 
{tp,p = 1,2,...,m}, and B as being a subset of another FoD 
O2 = {y1,.--, Yn} = {y¥q,¢ = 1, 2,...,n} with 01 4 Or. For this, 
we need to work within the Cartesian product space O £0, x Os. In 
the space © = ©; x Og, xp is not an elementary element but a subset 
of n elements of 0, ie. {2p} = {(@p, y1),---, (Up, yn) }. Similarly, 
Yq is not an elementary element but a subset of m elements of 0, 
ie. {yg} = {(@1, Yq),---,(@m, Yq) }. If Ai C O1 and B C On, then 


A; x B= {(Xp, Yq)|@p € Ais Yq € B} C O. Because {Ai,..., An} 
is a partition of O, then {A1 xO2,..., Ay, x Oz} defines a partition of 
© = 0) x Os. Because 01; x B = Ujei,....x((O1 X B) (Ai x O2)), 


we can apply TBT in the Cartesian space ©. More precisely, 


Bel(@1 x B) = Bel(U;((@1 x B) M (Ai x @2))) 
= Bel((O1 x B)M (Ai x O2))+ 

... + Bel((O1 x B) MN (Ak xX O2)) 

+U((A* x @2)N (@1 x B)), 


where the quantity U((A* x @2)M(©1 x B)) is now defined by 


U((A* x @2) N (O1 x B)) 4 


oe 


XEF A* x@9(M)|XEFo, x B(m) 


m(X). (51) 


The previous TBT formula when working in the Cartesian space 
© = ©; x O2 can be written more concisely as 


Bel(0: x B)= > Bel(A; x B)+U(A* x B)), 
4=1,.855 k 


because (01 x B)M(A; x O2) = (Ai X 82) (C1 x B) = Ai x B, 
and by notation convention U(A* x B) = U((A* x ©2)N(O1 x B)). 


(52) 


Note that the formula (52) can be used if and only if one knows 
the joint BBA m/(-) (or equivalently the joint belief) defined over the 
powerset of the Cartesian space O = ©; x Oo. 
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VIII. CONDITIONAL BELIEF FUNCTIONS BASED ON TBT 


In this section we show how FaginHalpern belief conditioning 
formulas can be established directly from TBT. This result is impor- 
tant because its provides a solid construction of FH formulas and it 
justifies its use for applications where belief conditioning is necessary. 
For deriving FH formulas from TBT we consider a partition {A;, A; } 
of the FoD © and a subset B of ©. Using TBT, one has 


Bel(B) = Bel(AiN B) + Bel(Ain B)+U(A* NB), — (53) 
where Fa-(m) = Fe(m) — Fa,(m) — Fq,(m) and 
U(A* nN B)4 > m(X). (54) 
XEF 4x (m)|XEFB(m) 
Hence 
Bel(B) — U(A* 7 B) = Bel(Ain B)+ Bel(AiN B). (55) 


At this stage, one may be tempted to divide right and left side of 
previous equality by Bel(B)—U(A*MB) (assuming its positiveness) 
to get 
_ Bel(A;M B) Bel(A; NB) 
~ Bel(B)-—U(A*O B) ~~ Bel(B) —U(A* OB)’ 
which would suggest to define Bel(A;|B) by taking 
Bel(A;|B) = Bel(A; 0 B)/(Bel(B) — U(A* NB)). 


Unfortunately, it can be seen from Ellsberg’s urn example that 
the conditional belief defined by (56) is inconsistent with bounds 
of imprecise conditional probabilities. Therefore, we need to go one 
step beyond in the calculus for defining consistent conditional belief 
and plausibility functions. Because by definition U((AiM B)") * 
PU(A; B) — Bel(Ai NB), we have 


Bel(Ai NB) = PU(A; B) — U((AiN B)’). 


Putting this expression of Bel(A; M B) into (55) and rearranging 
terms, we get 


Bel(B) + A(U) = Bel(A;N B)+ PUA:NB), — (58) 


with A(U) = U((AiM B)*) — U(A* 1B) and A(U) € [0,1] (see 
proof in appendix). 

Assuming Bel(B) > 0, and dividing each side of (58) by 
Bel(B) + A(U), we get 


(56) 


(57) 


_ Bel(A;N B) PU(A, B) (59) 
~ Bel(B)+A(U) ~~ Bel(B)+ A(U)’ 
or equivalently 
Bel(A;NB) _ i= PU(A; B) (60) 
Bel(B)+A(U) Bel(B) + A(U) 


Because the general relationship Bel(X) = 1 — Pl(X) between 
the belief and the plausibility must always be satisfied for any X C 
©, the equality (60) allows to define the conditional belief Bel(A;|B) 
and Pl(A;|B) by taking 


Bel(A; 9 B) 


BMAP)” Bei(B) + SOY ae 
PU(A;|B) 4 Secs (62) 


Using equality (58), the previous conditioning formulas can be 
rewritten equivalently as 


|p) =——_BeHAinB) 
sp) ———PUANB) 
PI(A;|B) = Bel(A;N B) + PUA; NB) ieee 


In replacing Aj by A; in notations of formulas (62)-(64) we get 
the conditional plausibility P/(A;|B) as 
PI(Ai;N B) 
Bel(B) + U((Ain B)*) — U(A* NB) 
PU(A; B) 


~ Bel(A; NB) + PU(A; MB) oe 


PI(A;|B) 4 


Formulas (63) and (65) coincide with Fagin-Halpern formulas 
[4] which were originally proposed from essentially a very good 
intuition. In this work, we have derived Fagin-Halpern formulas only 
from TBT using the proper decomposition of the set of focal elements 
of the a priori BBA. Note that the definition of Bel(A;|B) given in 
(61) satisfies the conditions Bel(@|B) = 0, Bel(Q|B) = 1, and 
Bel(A;|B) € [0,1]. To prove that Bel(A;|B) defined by (63) is a 
belief function one must prove that it is also an n-monotone (n > 2) 
Choquet’s capacity [36] on the finite set ©, or equivalently that the 
following inequality holds for any B C © with Bel(B) > 0 and for 
any collection A1,...,An of subsets of O 


x (-1)"" Bel(M Ai(B). 


IC{l,...,n} 
IAO 


Bel(A, U...U An|B) > 


The proof of this inequality is complicate. However, three very 
different proofs have already been given by Fagin and Halpern [3], 
Jaffray [6], and Sundberg and Wagner [7], the latter one being the 
clearest of fashion. 


IX. GENERALIZATION OF BAYES’ THEOREM 


In this section and thanks to the previous results, we generalize 
Bayes’ Theorem (BT) in the framework of belief functions. Assuming 
Bel(B) > 0, we have shown that Fagin-Halpern expression of 
Bel(A;|B) given by 


Bel(A;N B) 


Bel(A;|B) = -— J —————_ 66 
el(AlB) = San B)+ PUAN B) tee) 
is equal to the formula (61), ie. 
Sie EO 
Bel(B) +U((AiN B)’) —U(A* NB) 
In replacing Bel(B) by the expression (41) of TBT we get 
Bel(As|B) = Bel(Ain B) (68) 


Diat,...,k Bel(AiN B) + U(AiN BY’) 


Assuming Bel(A;) > 0, Fagin-Halpern expression of Bel(B|Aj;) 
given by 


Bel(BN Ai) 


Bel(B\A;) = Bel(Bn Ay) + PIUBA AD) (69) 
is equal to 
Bel(B|A;) = TA (70) 
where 
U((Bn A;i)") £ PI(B Nn Ai) — Bel(BN Ai) (71) 
= YS mx), (72) 


X€F( Bq a,)* (™) 
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with F(gna,)«(m) = Fo(m)—Fgna,(m)—F gua, (m), and where 
XEF Rx (m)|XEF 4, (m) 


with Fg+(m) = Fe(m) — Fa(m) — Fg(m). 


U(B*N A;) m(X), (73) 


From (70), one obtains 
Bel(AiNB) = Bel(B|A;)[Bel(Ai)+U((B 9 Ai)")—U(B*NAi)]. 
By replacing the above expression of Bel(A; % B) into (68), we 
obtain 
Bel(B\A;)q(Ai, B) 


bid) = 
o_, Bel(B|A;i)¢q(Ai, B) + U((AiN B)’) 


, (74) 


where the factor q(A;, B) introduced here for notation conciseness 
is defined by 


q(Ai, B) & Bel(Ai) + U((B Ai)*) —U(B* N Ai). (75) 


This result allows us to establish the following Generalized Bayes’ 
Theorem (GBT). 


Generalized Bayes’ Theorem (GBT): For any partition {A;,2 = 
1,...,k} of a FoD 0, any belief function Bel(-) : 2° + [0,1], and 
any subset B of © with Bel(B) > 0, then one has 
Bel(B|Ai)q(Ai, B) 


Bel(A:1B) = 5B iaja(A,, B) UAB) 


, (76) 


where 
U(AnBy)*£ 
X€F(a,qp)*(m) 


= Pl(A; N B) — Bel(A; NB), 


m(X) 


and 
q( Ai, B) = Bel(A;) + U((B NM A;)") — U(B* 0 Aj). 


Lemma: GBT reduces to Bayes’ Theorem if Bel(-) : 2° + [0, 1] is 
a Bayesian belief function. 


Proof: See appendix. 


When A; C ©; and B C O2 with 0; # Os, we must work in 
the Cartesian product space O = ©; x Og and the GBT formula is 
similar to (76) in replacing A; by A; x Q2, B by 01 x B, and where 


pe 


XEF (A; xOg)N(O1 x B))* (™) 


= PI((A; x @2) N(O1 x B)) 


U((Ain B)") = m(X) 


— Bel((A; x ©2)N(@1 x B)), (77) 
and where the factor g(A;, B) must be replaced by 
q(A; X @2,01 x B)F 
Bel(A; X Q2) + U((BN A;)") — U(B* N Aj), (78) 
with 
U((BN Ai)*) 4 x m(X) 
XEF (6, x BYN(A; xOo)* (™ 
= PI((®; x B)N(A; x @2)) 
— Bel((@1 x B)N(Ai x O2)), (79) 
U(B* nN Ai) & > m(X), (80) 


X€Fo, x Bx (m)IXEF 4, xO9(m) 


and Fo, x B* (m) = Fe, x2(m) — Fe,xB(m) — Fo,xa(m). 


In the formulas (77)-(80), X is an elementary element of the 
Cartesian space © = ©; x O2, and m(X) is the (joint) BBA value 
of X defined on the power set of Cartesian product space. 


The application of GBT formula when working with A; C ©; and 
B C ©2 with ©; ¥# Oz is not easy in general because it requires 
the knowledge of joint BBA m(-) defined over 2°!*©2 which is 
rarely known in practice. If the joint BBA m/(-) can be expressed 
(or approximated) as a function of two marginal BBAs m®1(-) and 
m®?(-) (assumed to be known) defined respectively over ©; and Qs», 
then GBT formula should become tractable. 


X. ILLUSTRATIVE EXAMPLE OF GBT 


In this section, we provide a complete quite simple illustrative 
example to show how belief conditioning formulas work and how to 
apply GBT. 


Let us consider the FoD © = {6;,i = 1,...,7} and the set of 
focal elements Fo(m) = {X1, X2,...,Xo9} of a BBA m(-) defined 
over 2° given in Table III. Let’s consider the partition {A1, Az, A3} 
of O with A, = 6, U03U04U67, Ae = 62U45 and A3 = Oe, and let 
consider the subset B = 04U05 U06 U67 of © having positive belief 
Bel(B) = m(X4)+m(X6)+m(Xo9) = 0.39. Table IV summarizes 
the belief and plausibility values of different subsets of © which are 
needed in the derivations. 


Table III 
FOCAL ELEMENTS AND THEIR MASSES. 
Focal element X BBA m(X 
XX, = 62003004005 007. m(X1) = 0.01 _ 

X_q = 0, U0, U03U 04 m(X2) = 0.02 
X3 = 03 UO5 U M6 m(X3) = 0.03 
X4 = 604U 07 m(Xa) = 0.04 
X5 = 02 m(Xs5) = 0.20 
Xe = 06 U 07 m(X6) = 0.30 
X7 = 02 UO3 UO7 m(X7) = 0.20 
Xs = 0, U04 U6 m(Xg) = 0.15 
X9 = 06 m(Xo9) = 0.05 


Table IV 
BELIEF AND PLAUSIBILITY VALUES USED FOR THE DERIVATIONS. 

Subsets of O Bel(- PU: 

B= 64U 65 U 06 U 67 Bel(B) = 0.39 PUB) = 0.80 

A; = 6; U 63 U 64 U 07 Bel(A1) = 0.04 PI(A1) = 0.75 

Ag = 02 U5 Bel(A2) = 0.20 PI(A2) = 0.46 

As = 06 Bel(A3) = 0.05 PI(A3) = 0.53 
AiNB=064U067 Bel(A;, 1 B) = 0.04 PUAL NB = 0.72 
A2N B= 645 Bel(A2N B) =0 Pl(A2 9 B) = 0.04 
A3NB= 06 Bel(A3 9 B) = 0.05 PIU Ag 9B) = 0.53 


AiNB=05U 46 
A2N B= 04 U 06 U 67 
A3N B= 64U 05 U 67 
AiNB=6,U 63 
A2NB= 62 
A3NB=0 


Bel(A, N B) = 0.05 
Bel(Az N B) = 0.39 
Bel(A3 N B) = 0.04 
Bel(Ayn B) =0 
Bel(Az N B) = 0.20 
Bel(A3 B) =0 


PU(A, B) = 0.54 
Pl(AzM B) = 0.80 
PI(A3 9 B) = 0.75 
PUA, B) = 0.41 
Pl(AgM B) = 0.43 
PU A39 B) =0 


In this example, one has 
Fp(m) = {X4, X6, Xo}, 
Fa(m) = {Xs}, 
F+(m) = Fe(m) — Fa(m) — Fg(m) 
= {X1, X2, X3, X7, Xs}, 
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Fa, m) = {Xa}, 

FA, (m) = {Xs, Xo}, 

Fay m) = {X5}, 

Fay m) = {X4, X6, Xs, Xo}, 

Fas m) = {Xo}, 

FA,(m) = {X1, X2, Xa, X5, X7}, 

Fax(m) = Fe(m) — Fa,(m) — Faz(m) — Faz(m) 


= {X1, X2, X3, X6, X7, Xs}. 
e Results with Fagin-Halpern conditioning formulas 


Using Fagin-Halpern conditioning formulas (63) and (69) and 
the fact that PI(A;|B) = 1 — Bel(Aj|B) and PI(BJAi) = 1 —- 
Bel(B|A;), we obtain in this example the conditional belief and 
plausibility values given in Tables V-VI 


Table V 
Bel(A;|B) AND Pl(A;|B) WITH FAGIN-HALPERN CONDITIONING. 


Subsets of O = Bel (A; |B) PIU(A;|B) 

A Bel(Ai |B) © 0.0690 PI(Ay[B) © 0.9351 
Ap Bel(A2|B) =0 PI(A2|B) ~ 0.0930 
A3 Bel(A3|B) © 0.0625 Pl(A3|B) & 0.9298 

Table VI 
Bel(B|A;) AND Pl(B|A;) WITH FAGIN-HALPERN CONDITIONING. 

Subsets of © = Bel (BJ A; PUBIA; 

Aj Bel(B|Ai) + 0.0889 PI B]A1) =1 

Ap Bel(B|A2) =0 PI(B| Ag) © 0.1667 
A3 Bel(B\|A3) =1 Pl(B\A3) =1 


To apply and verify GBT on this example, one needs to compute 
Bel(A;), U((BN Ai)") and U(B* M A;) to calculate q(Ai, B) 


factors and also U((AiM B)") because they enter in GBT formula 
(76). These values are listed in Table VII for convenience. 


Table VII 
VALUES OF q(A;, B) AND U((A; MN B)*) FOR GBT FORMULA. 
Subsets of 0 =g(Ai, B) U((AiN B)") 
Ay 0.45 0.49 
Ag 0.43 0.41 
A3 0.05 0.71 


The value q(Ai,B) = 0.45 appearing in Table VII has been 
calculated as follows 


q(A1, B) = Bel(A1) +U((BN A1)*) — U(B* nN A1) = 0.45, 
because 
Bel(A1) = 0.04, 
U((BN A1)") = PUB Ai) — Bel(BM Ay) = 0.41, 
UE AS x m(X) =0. 


X€F 4, (m)|X€Fp«(m) 


The value U((Ai MN B)") = 0.49 appearing in Table VII is calculated 
as follows 
U((Ai 9 B)*) = PU(A B) — Be(Ain B) 
= 0.54 — 0.05 = 0.49. 


Other values of Table VII are calculated similarly. 


One verifies that GBT formula (76) works because we retrieve 
correct values obtained with FH formula, given in Table V. Indeed, 
one has 


Bel(B|A1)q(Ai, B) 


rae) 
dy_, Bel(B\A;)q(Ai, B) + U((A1N B)") 
ra 0.0889 - 0.45 
~ (0.0889 - 0.45) + (0- 0.43) + (1 - 0.05) +.0.49 
= 0.0690, 
Bel(Ag|B) _ Bel(B|Az2)q(Az, B) 


~ 78. Bel(B|Ai)q(Ai, B) + U((A2 N B)*) 
4 0 - 0.43 
™ (0.0889 - 0.45) + (0 - 0.43) + (1 - 0.05) + 0.41 
= 0, 
Bel(B|A3)q(As, B) 
yoy, Bel(B|Ai)q(Ai, B) + U((As 9 B)*) 
Z 1.0.05 
™ (0.0889 - 0.45) + (0 - 0.43) + (1- 0.05) £0.71 
= 0.0625. 


Bel(As|B) = 


e Results with Shafer’s conditioning formulas 


Using Shafer’s conditioning formulas (31) and (32), we obtain in 
this example the conditional belief and plausibility values given in 
Table VIII and IX. 


Table VIII 
Bel(A;|B) AND PI(A;|B) WITH SHAFER’ S CONDITIONING. 
Subsets of © = Bel (A; |B PI(A;|B 
Ay Bel(A,|B) = 0.3250 = PI(A|B) = 0.9000 
As Bel(Ag|B) =0 Pl(Ag|B) = 0.0500 
As Bel(A3|B) = 0.0625 Pl(A3|B) = 0.6625 
Table IX 
Bel(B|A;) AND PI(B|A;) WITH SHAFER’ S$ CONDITIONING. 
Subsets of © = Bel (B| A; PUBIA; 
Ay Bel(B\A1) & 0.4533 =PI(.B]A,) © 0.9600 
As Bel(B|Az) © 0.0652 Pl(B| Az) & 0.0870 
A3 Bel(B\|A3) =1 Pl(B\A3) =1 


As shown in the previous Ellsberg’s urn example, one knows that 
Shafer’s belief conditioning formulas are inconsistent with lower 
and upper bounds of imprecise conditional probabilities, and with 
this example one shows that Shafer’s belief conditioning is also 
incompatible with GBT formula (76). We emphasize that GBT has 
been established by a constructive manner from TBT using a direct 
and relatively simple calculus'® without need of rule of combination 

assuming Bel(B) and Bel(A;) being positive to have well defined 
expressions as it is for this example. 
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of basic belief assignments. When using Shafer’s belief conditioning 
formulas, one sees that the conditional values are not coherent since 
they do not verify GBT because we obtain in this example 


Bel(Ai|B) = 0.3250 (from the results in Table VIII using eq. (32)) 
z Bel(B|A1)q(Ai, B) 
yo2., Bel(B\Ai)q(Ai, B) + U((A1 9 B)”’) 
7 0.4533 - 0.45 
~ (0.4533 - 0.45) + (0.0652 - 0.43) + (1 - 0.05) +.0.49 
~ 0.2642, 
Bel(A2|B) = 0 (from the results in Table VIII using eq. (32)) 
Z Bel(B|A2)q(A2, B) 
y7°_, Bel(B\Ai)q(Ai, B) + U((A2 2 B)*) 
7 0.0652 - 0.43 
~ (0.4533 - 0.45) + (0.0652 - 0.43) + (1 - 0.05) + 0.41 
~ 0.0405, 
Bel(A3|B) = 0.0625 (from the results in Table VIII using eq. (32)) 
2 Bel(B|A3)q(A3, B) 
Di Bel(B|Ai)q(Ai, B) + U((As 0 B)’) 
7 1 - 0.05 
~ (0.4533 - 0.45) + (0.0652 - 0.43) + (1 - 0.05) +0.71 
~ 0.0504. 


Ellsberg’s urn example and this example show clearly that Demp- 
ster’s rule of combination used by Shafer to establish his belief 
and conditioning formulas does not provide coherent and satisfactory 
results since they are inconsistent with lower and upper bounds of 
imprecise conditional probabilities, and they do not satisfy GBT also. 


XI. CONCLUSION 


In this paper new important results for reasoning with belief 
functions were obtained and discussed. The Total Belief Theorem 
(TBT) was established from a simple decomposition of the set of 
focal elements of any basic belief assignment. TBT is a generalization 
of Total Probability Theorem for belief functions, and based on it 
we are able to derive conditional belief and conditional plausibility 
functions that coincide with Fagin-Halpern conditioning formulas 
which are coherent with lower and upper bounds of imprecise 
conditional probability. Hence, this work provides a solid justification 
of the establishment of formulas presented by Fagin and Halpern. 
The TBT has been generalized for dealing with different frames of 
discernments as well thanks to the Cartesian product space. Also as a 
direct consequence of TBT, we have presented a generalization of the 
well-known Bayes’ Theorem for the framework of belief functions 
called the Generalized Bayesian Theorem (GBT). We have proved 
that TBT and GBT reduce to TPT and BT respectively as soon as we 
work with Bayesian belief function because in this case the Bayesian 
belief function is homogeneous to a probability measure. On the base 
of Ellsberg’s urn example and an illustrative example we have shown 
that Dempster’s rule of combination used by Shafer to establish 
his belief and conditioning formulas does not provide coherent and 
satisfactory results because they are inconsistent with lower and 
upper bounds of imprecise conditional probabilities and because they 
do not satisfy GBT also. These new theoretical results should (we 
hope) reconcile the Bayesian reasoning practioners with evidential 
reasoning practioners and bring new foundations for reasoning with 
uncertainty thanks to belief functions. 
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APPENDIX 


Proof of TBT 


From the basic definition of Bel(B) one has for any B C 0, 
Bel(B) = Yi xere(m)\xcp XX). Because the set of focal ele- 
ments Fo(m) can always be decomposed as the union Fo(m) = 
Fa,(m)U...UFa,(m)UF4*(m), then one can always decompose 
the previous sum as follows 


Bel(B) = » m(X) 
X€Fo(m)|XCB 
= > m(X) + 


X€EF 4, (m)|XEFB(m) 


+ > 


XEF A, (m)|X€FpB(m) 

XEF 4x (m)|XE€FB(m) 

= Bel(A,N B)+...+ Bel(Ax MB) 

+ ys m(X) 
XEF ,x(m)|XE€FB(m) 


= Bel(Ain B)+U(A* NB), 


m(X) 


where U(A* 9 B) + xe F yx (m)|XEFp(m) MX), which com- 
pletes the proof of TBT. 


Proof of the corollary of TBT 


If m(-) is Bayesian then any focal element X of Fo(m) is a 
singleton of 2° which either belongs to A;, or to A; (but it cannot 
belong to both). Therefore, Fo(m) = Fa,(m)U...U Fa,(m) 
and Fa«(m) = §. TBT formula applies with’? U(A* 9 B) = 
DIX EF gu (m)|XEFp(m) m(X) = Vixeo|xerp(m) MX) = 0 and 


thanks to TBT one has in this case for any partition {Ai,..., Ax} 
of © and any subset B of © the following equality satisfied 
Bel(B)= > Bel(A;nB). (81) 


i=1,...,k 


When m(-) is Bayesian, its corresponding belief function Bel(-) 
is homogeneous to a probability measure P(-) [1], and therefore 
the previous equality is consistent with TPT formula (14), which 
completes the proof of the corollary of the TBT. 


17We recall that if a summation has no term then its value is set to zero. 
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Proof of TPIT 
From equality Pl(B) = 1 — Bel(B) and TBT, one has 


t= 1; .20,k 
=1-— S> (Bel(A;B)+1-1)—U(A* 1B) 
[eel rs k 
=1-— S> (-1+Bel(A;m B) +1) -U(A* 1B) 
t=1,.22;k 
=1- S> (-(1- Bel(Ain B)) + 1) - U(A* NB) 
t=1 sesh 
=1+ S$) PUAN B)-—k-U(A* NB) 
eee k 
= >> Pl(A;UB)+1—k-U(A*NB) 
de 
which completes the proof of TPIT. 
Proof that A(U) € [0,1] 
A(U) = U((Ai NM B)") — U(A* NB) 
= [P1(A;N B) — Bel(A;n B)| 
— [Bel(A;N B) + Bel(A;N B) — Bel(B)] 
= Pl(A;N B) — Bel(AiN B) 
+ Bel(B) — Bel(A;N B) — Bel(A;N B). 


To prove that A(U) > 0, one must prove equivalently that 


Pl(A; B) — Bel(A; NB) + Bel(B) > 


Bel(A;N B) + Bel(A;N B). (82) 


Using TBT, one has 


Bel(B) = Bel(Ain B) + Bel(Ain B) + U(A* NB). 


Replacing expression of Bel(B) in inequality (82), one must verify 
if the following equality is satisfied 


Pl(A;NB)—Bel(AiNB)+Bel(AiNB)+Bel(AiNB)+U(A*nB) 
> Bel(AiN B) + Bel(Ain B). 


After simplification, we have to prove that the following inequality 
holds 


PU(A; B) + U(A* 1B) > Bel(Ain B). 


Because PI(A; B) = Bel(AiN B) +U((AiN B)"), one has to 
verify if the following inequality holds 


Bel(A; NB) + U((AiN B)*) + U(A* NB) > Bel(Ai NB). 


After simplification (omitting both Bel(A;M B) in left and right side 
of the previous inequality), one has to prove that the inequality below 
is satisfied to prove that A(U) > 0 


U((AiN B)*) + U(A* NB) > 0. 


Because U((Ai M B)*) € [0,1] and U(A*NB) € (0, 

inequality always holds which proves that U((A;N B)") —U(A* 
B) > 0. Moreover because U(A*  B) € [0,1], then —U(A* 
B) € [-1,0]. Because U((AiM B)") € [0,1], one deduces that 
U((Ai Nn B)") — U(A* NB) <1. This completes the proof. 


1], the previous 


Proof of Lemma 


If Bel(-) : 2° ++ [0,1] is a Bayesian belief function, then all 
focal elements “of its corresponding BBA m(-) are singletons of 
2°. In this case Bel(:) ) and Pl(-) functions coincide and therefore 
one has U((Ai B)’) = PIA; B) — Bel(A;n B) = 0 and 
U((BN Ai)’) = PU(BNAi)— Bel(BN Aj) = 0. Any focal element 
(singleton) of m/(-) is either a subset of B or a subset of B of the 
FoD ©. Therefore, Fg«(m) = 0, which implies U(B*N A;) = 0, so 
that q(A;, B) = Bel(A;). The GBT formula (76) with in this case 
q(Ai, B) = Bel(A;) and U((A; NM B)*) = 0 reduces to formula 


Bel(B|A;) Bel(Ai;) 


Bel(A;|B) = Pei) 
yo, Bel(B|A;) Bel(A;) 

which coincides with formula (21) because Bel(-) (being a Bayesian 
belief function) is homogeneous to a probability measure P(-). This 
completes the proof that GBT formula is consistent with Bayesian 
Theorem formula when the Belief function is Bayesian. 
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Abstract—In this paper we present a simple formulation of 
the Generalized Bayes’ Theorem (GBT) which extends Bayes’ 
theorem in the framework of belief functions. We also present 
the condition under which this new formulation is valid. We 
illustrate our theoretical results with simple examples. 


Keywords: Generalized Bayes’ Theorem (GBT), Simplified 
GBT (SGBT), Total Belief Theorem (TBT), belief functions. 


I. INTRODUCTION 


Based on Dempster’s works [1], [2], Shafer did introduce 
Belief Functions (BF) in 1976 to model the epistemic uncer- 
tainty’ and to reason under uncertainty [3] which is referred 
as Dempster-Shafer Theory (DST) in the literature. Belief 
functions are mathematically well defined and they are very 
appealing from the theoretical standpoint because of their 
good ability to model uncertainty interpreted as imprecise 
probability measures in Dempster’s original works. 

From the end of 1970’s the DST has however been cast in 
doubts because Dempster’s rule of combination of Basic Belief 
Assignments (BBAs) yields counter intuitive results not only 
in high conflicting situations but also in low conflicting cases 
as well [4]-[6], and Shafer’s conditioning formulas based on 
Dempster’s rule are not consistent with conditional probability 
calculus [7], [8]. Discussions on the validity of DST can be 
found, for instance, in [4], [5], [9]-[13]. These two major 
concerns make DST quite risky for applications involving 
randomness and epistemic uncertainties and it should be 
replaced by better techniques to reason under uncertainty with 
belief functions. 

In 2018 we did establish in [8], [14] two new important gen- 
eral results for reasoning with belief functions: the Total Belief 
Theorem (TBT), and the Generalized Bayes’ Theorem (GBT). 
TBT and GBT generalize the well-known Total Probability 
Theorem (TPT) and Bayes’ Theorem (BT) of the Probability 
Theory (PT). Thanks to these new theorems we have now in 
hands a generalized Bayesian inference mechanism for work- 
ing with imprecise probability measures in the belief functions 
framework. Similarly to the probability theory requiring a 
good estimation of pdf (or pmf) involved in Bayes’ formula to 
make a good inference, the major difficulty for applying GBT 


‘Also called sometimes the cognitive uncertainty by some authors. 
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is the knowledge (or good estimation) of all? BBAs required in 
GBT. For a given size of frame of discernment, GBT requires 
more computations than Bayes formula (if we would prefer to 
work with probabilities) because we need to work with BBAs 
defined on the powerset of the frame of discernment. 

The general formulation of GBT presented in details in 
[8] is not easy to apply and that is why we present in this 
paper a simpler and more convenient formulation of GBT 
providing elegant and useful mathematical expressions. The 
obtention of these new formulas of GBT are established from 
a dichotomous partitioning of the frame of discernment. 

This paper is organized as follows. After a short reminder of 
basics of belief functions in Section II and their constructions 
based on Dempster’s multi-valued mapping, we present briefly 
the Total Belief Theorem and Fagin-Halpern conditioning in 
Section III, and the Generalized Bayes Theorem in Section IV. 
In section V we establish the Simplified GBT (SGBT) drawn 
from GBT for working with a dichotomous partitioning of the 
frame of discernment. Section VI presents and discusses two 
examples of SGBT results. Section VII concludes this paper. 


II. BASICS OF BELIEF FUNCTIONS 
A. Basic belief assignment 


We consider a finite discrete frame of discernement (FoD) 
O = {01,62,...,9,}, with n > 1, and where all exhaustive 
and exclusive elements of © represent the set of the potential 
solutions of the problem under concern. The set of all subsets 
of © (including the empty set 0), and ©) is the power-set of O 
denoted by 2°. The number of elements (i.e. the cardinality) 
of 2° is 2!°!. A Basic Belief Assignment (BBA) associated 
with a given source of evidence is defined as the mapping 
m(-) : 2° — [0,1] satisfying the conditions m(@) = 0 and 
do ac2e M(A) = 1. The quantity m(A) is the mass of belief 
for subset A committed by the Source of Evidence (SoE). 


B. Focal elements 


A focal element X of a BBA m(-) is an element of 2° 
such that m(X) > 0. Note that the empty set is not a 
focal element of a BBA because m(Q) = 0 (closed-world 


2possibly joint BBAs if we work on Cartesian product spaces [8]. 
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assumption of Shafer’s model for the FoD). The set of all 
focal elements of m(-) is denoted 


Fe(m) = {X C O|m(X) > 0} 
= {X € 2°|m(X) > 0}. (1) 


The set of focal elements of m/(-) included in A C © is 
denoted, where = means equal by definition, by 


Fa(m) = {X € Fo(m)|XN A= X}. (2) 
Fe(m) can be partitioned as {F.4(m), F4(m), Fa«(m)} with 
Fa-(m) = Fe(m) — Fa(m) — Fa(m), (3) 
which represents the set of focal elements of m/(-) which are 
not subsets of A, and not subsets of the complement of A in 
© which is A = © — {A}. The minus symbol in © — {A} 
denotes the set difference operator. 
C. Belief, plausibility and uncertainty 
Belief and plausibility functions are defined as? 


do m(X) 


xXe2° 
XCA 


Bel(A) * 


t 
ti 
3 
S 


m(X), (4) 


Pl(A) & 


xXE2° 
XNAAO 


= S me 


X€Fe(m) 
XNAAO 


== y, 
ASE alm) 


= 1—Bel(A). (5) 


m(X) 


The length of the belief interval [Bel(A), Pl(A)] is usually 
called by abuse of terminology the uncertainty on A com- 
mitted by the SoE. In fact it represents the imprecision on 
the (possibly subjective) probability of A granted by the SoE 
which provides the BBA m/(-). We denote it U(A*), and it is 
defined as 

pS 


U(A*) = Pl(A) — Bel(A) = 
XEF 4x (m) 


If all the elements of Fo(m) are singletons, m(-) is called 
a Bayesian BBA [3], and its corresponding Bel(-) and Pl(-) 
functions are homogeneous to a same (subjective) probability 
measure P(-). In this case F4x(m) = Fy.(m) = 0. Shafer 


3By convention, a sum of non existing terms (if it occurs in formulas 
depending on the given BBA) is always set to zero. 


did prove in [3] (p.39) that m(-), Bel(-) and Pl(-) are one- 
to-one, and for any A C O, m/(-) is obtained from Bel(-) by 
Mobius inverse formula 


m(A)= > (-1)!47?!Bel(B). (7) 


BCACO 
D. Interpretation and construction of belief functions 


In original Dempster’s works [1] belief Bel(A) and plausi- 
bility PJ(A) are interpreted as lower and upper bounds of an 
unknown probability P(A), and so Bel(A) < P(A) < PI(A). 
The construction of m(A), Bel(A) and PI(A) are mathemat- 
ically well defined from an underlying random variable with a 
known probability measure and a given multi-valued mapping 
as follows: 

e Consider a random variable x with its set of possible 
values in Y = {21,...,2} with known probabilities 
pee PES 25); JS lye MG 

e Consider a FoD O = {6;,...,0,} for the variable 6 
under concern; 

e Consider/learn a multi-valued mapping T' : XY +> 2° such 
that if 2 = 2; then 0 € A, so that A =I'(a,) € 2°; 

e The belief (lower proba) and plausibility (upper proba) 
that 6 € A are given by [1] 


P,(A) = Bel(A) = Bel(@ € A) 
P*(A) = PI(A) = PI(0 € A) 
=P({reEX|[(rz)NAF}). (9) 


The mass of belief that 6 belongs to A is given by 


m(A) = P({x € XIE (x) £0,0(x) = A}). (10) 


Example for multi-valued mapping: Paul has been killed 
and Police asks a witness W: Who did you see killing Paul? 
Witness answer is Mary. To estimate the confidence of this 
testimony report one has to consider if this witness W is more 
or less precise when he is reliable, or if he is not reliable. 
So the state of W can belong to Y = {21,22,23} where 
x, means W is precise, 22 means W is approximate, and 
x3 means W is not reliable. We suppose that the a priori 
probabilities of the state of W are P(x1) = 0.3, P(x2) = 0.1 
and P(x3) = 0.6. As FoD ©, we consider a set of three 
suspects that includes the unknown killer 


O = {0) = Mary, 02 = Peter, 63 = John}. 

If we define the multivalued mapping I'(.) as follows 

I'(a1 = W is precise) = 61, 

I'(v2 = W is approximate) = {01,62}, 

I'(xz3 = W is not reliable) = {0), 02,03} = 0. 
T(z, = W is precise) = 0; means that if W is precise then 
Mary has killed Paul. [(a2 = W is approximate) = {61,02} 
means that if W is less precise then Mary or Peter have killed 


Paul. I'(23 = W is not reliable) = © means that if W is not 
reliable then we have no useful information about the killer. 
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Applying formulas (8) and (9), one gets 


Bel(Q) = P({x|['(x) C 0}) = P(O) =0 

=1- PI(0), 

Bel(61) = P({x|P (2) € 61}) = P(e) = 0.3 
=1- PI(02 U 3), 

Bal(0a) = PUall(a) C A} 0 
= 1 —Pl(6, U6s), 

Bel(b2) = P({x|E(z) € 6a}) = 0 
= 1 —Pl(, U6), 


Bel(0; U 62) = P({ax 


(a 


C 6, U O2}) = P({x1, r2}) 


= P(x) + P(a2) = 0.4 
=f = PIs 


? 


Bel(6; U63) = P({2|P(x) C 6; U03}) = P(a1) = 0.3 
=1- Pl(62), 
Bel(02 LJ 03) = P({ax (a C O2U 03}) =0 
=1- Pl(61), 
Bel(©) = P({2|P'(x) C O}) = P({x1, 2, 73}) 
P(x1) + P(x2) + P(x3) =1 
=1- Pi(), 


and 


P10) = P({2|0(« 
= 1- Bel(0) 


? 


0 4 0}) = P(O) =0 


Pl(01) — P(a2|P (2) N61 x 0}) — P({x1,%2,x3}) 
= P(a1) + P(ax2) + P(a3) —1 
=1- Bel(62U 03), 


Pl(02) = P({alT(x 


N62 #0}) = P({x2, x3} 


= P(2x2) oi P(x3) = 0.7 
= 1- Bel(@, U43), 


PIU(63) = P({z|T (x) 03 4 O}) 
= P(x3) = 0.6 
=1- Bel(6; U 62), 
PIO, U 02) = P({x|P' (x) N (01 U 62) A OF) = P({x1, v2, v3}) 
= P(a1) + P(x2) + P(a3) = 1 
= 1- Bel(6s), 
PIO, U 03) = P({2x|P' (x) N (01 U 63) 4 OF) = P({x1, v2, v3}) 
= P(x) + P(x2) + P(a3) = 1 
= 1- Bel(62), 
PU(A. U 03) = P({ax T(x) a (02 U 03) x 0}) = P({x2, x3}) 
= P(x2) + P(x3) = 0.7 
= 1- Bel(61), 
PU(O) = P({2|P (x) N (61 U 02 U 03) F O}) = P({x1, 2, 23}) 


= P(a1) +r P 
=1- Bel(@). 


x2) + P(a3) = 1] 


In applying formula (10), one gets finally the BBA 


m(01 U 62) = P({a|P'(x) = 01 U62}) = P(x2) = 0.1, 
m(01 U 63) = P({a|P'(x) = 01 U63}) = 0, 
m(02 U 63) = P({a|P'(x) = 02 U3}) = 0, 

m(Q) = P({x|I'(x) = O}) = P(x3) = 0.6. 


Some authors have proposed different interpretations of belief 

functions to escape the probabilistic framework introduced by 
Dempster to save DST of its inherent contradiction mainly 
due to the choice of Dempster’s rule of combination and 
Shafer’s conditioning approach based on Dempster’s rule. The 
most important attempt has been done in 1990’s by Smets in 
[15] with his axiomatic Transferable Belief Model (TBM). 
It however remains disputable because of the ambiguous (or 
inconsistent/double) interpretation of the empty set. 


In this paper we adopt the original Dempster’s interpretation 
and construction of belief functions because it is mathemati- 
cally well defined, clear and consistent. 


Ill. TBT AND FAGIN-HALPERN CONDITIONING 
A. Total Belief Theorem 


In [8], we have generalized the Total Probability Theorem 
(TPT) [16] for working with belief functions and we proved 
the following simple and important theorem. 


Total Belief Theorem (TBT): Let’s consider a FoD © with 
|O| > 2 elements and a BBA m/(-) defined on 2° with 
the set of focal elements Foe(m). For any chosen partition 


{Ai,..., Ax} of © and for any B C O, one has 
Bel(B)= S> Bel(A;NB)+U(A*NB), (11) 
i=1,...,k 
where 
U(A* 0B) = m(X), (12) 
X€F 4*(m)|X€Fx(m) 
and F4«(m) * Fe(m) — Fa,(m) —...— Fa, (m). 


Proof of TBT: see [8], with example. 


From (12), one sees that U(A*M B) € [0,1]. If one applies 
TBT with B = 0, we get )7j_, ,, Bel(Ai) + U(A*) = 1 
where U(A*) + DXEF 40 (m) U(X). This equality corre- 
sponds to TPT if U(A*) = 0 (ie. there is no imprecision 
on the value of probabilities of A;, i =1,...,k). 

In spite of its apparent simplicity the TBT is very important 
because it provides a strong theoretical justification of Fagin- 
Halpern (FH) belief and plausibility conditioning formulas 
[7], [17] proposed in 1990’s as a very serious alternative to 
Shafer’s conditioning formulas. Indeed, it can be easily proved 
with a simple counter-example (e.g. Ellsberg’s urn example 
- see [8]) that conditioning formulas established by Shafer 
from Dempster’s rule of combination are not consistent with 
bounds of the conditional probabilities. The main advantage 
of FH conditioning formulas is that they provide exact bounds 
of imprecise conditional probability and they coincide exactly 
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with the conditional probability when the belief functions 
involved in FH formulas are Bayesian. 


B. Fagin-Halpern belief conditioning formulas 
In [8] we have proved that the TBT justifies the following 
FH conditioning formulas (assuming Bel(B) > 0) 
Bel(AN B) 
Bel(AN B) + PUAN B)’ 
PI(AN B) 
PUAN B) + Bel(AN B)’ 


Bel(A|B) = (13) 


Pl(A|B) = (14) 
Fagin and Halpern in [7] proved that Bel(-|B) is a true 
belief function and so FH belief conditioning is an appealing 
solution for belief and plausibility conditioning. A proof that 
FH formulas are belief functions has been also given by 
Sundberg and Wagner in [18]. Hence TBT provides a complete 
justification of FH formulas which offers a full compatibility 
with the conditional probability calculus [18], [19]. 
Similarly, by interchanging notations A and B and assuming 
Bel(A) > 0, the previous FH formulas can be expressed as 


7 Bel(AN B) 
EI) Sar pipe a 
PI(AN B) 
PUBIA)= Sian B)+Bel(Bnay 1 
When m(-) is Bayesian Bel(-) = mt) = P(-), and so 
PI(ANB) = Bel(ANB) = P(ANB), P (ANB ) = Bel(AN 
B) = P(AMB) and Pl(BN A) = Bel(BN A) = P(BN A). 
FH formulas above reduce to 
_ _ P(ANB) 
Bel(A|B) = PI(A|B) = P(ANB)+P(ANB) 
From TPT [16]) P(AN B) + P(AN B) = P(B), thus 
Bel(A|B) = PI(A|B) = P(ANB)/P(B) = P(A|B). (17) 
Similarly, one can also easily verify that 
Bel(B|A) = PI(B|A) = P(AN B)/P(A) = P(BIA). (18) 


Hence from (17) and (18) one obtains the well-known equality 


P(AN B) = P(A|B)P(B) = P(BIA)P(A). (19) 
IV. GENERALIZED BAYES’ THEOREM 


In [8] we did also establish from TBT the following 
Generalized Bayes’ Theorem (GBT) and lemma. 


Generalized Bayes’ Theorem (GBT): For any en 


Ay,...,Axz} of a FoD 0, any belief function Bel(- ae 
y 

[0, 1], and . subset B of © with Bel(B) > 0, one i for 
1€ Ul, 

Bel(A;|B) = oe ae) (20) 

ye, Bel(BlAia(Ai, B) + U((Ain By) 

where 

q(A;, B) = Bel(A;) + U((BN A;)") — U(B*N A;), (21) 


with 
U((BN A;)*) * PlU(Bn A;) — Bel(BN Aj), (22) 
U(B*n Aj) 4 S- m(X), (23) 
X€F px (m)|X€Fa,(m) 
and where 
U((A;N B)*) = Pl(A; B) — Bel(A; NB). (24) 


Lemma 1: GBT degenerates to Bayes’ theorem formula if 
Bel(-) is a Bayesian BF, that is 


P(BIA) P(A) 
ha PUBIAD)P(Ai) 


V. SIMPLIFIED FORMULATION OF GBT 


P(A;|B) = (25) 


In this section we establish a simplified formulation of 
GBT which will be denoted SGBT for short in the sequel. 
Because the GBT formula (20) is not very easy to use and 
quite difficult to compute in applications, we propose a 
more useful simplified formulation of GBT which is drawn 
from (20) when considering only a simple dichotomous 
partitioning of the frame of discernment ©. More precisely 
we consider a partition {A, A} of © with A C © and A is the 
complement of A in O, that is A = © —{ A}. We establish the 
following theorem which is the main contribution of this paper. 


Simplified Generalized Bayes’ Theorem (SGBT): For any 
partition {A,A} of a FoD 0, any belief function Bel(-) : 
© ++ [0,1], and any subset B of 0, one has 
e If PlI(AN B) > 0 (Condition C;) 


Bel(B|A) PUAN B) 


Bel(AlB) = TBAPUANB)+PUBIA)PIANB) °° 
e If Bel(AN B) > 0 (Condition C2) 
nities PI(B|A)Bel(AN B) ae 


PlUB|A)Bel(AN B) + Bel(B|A)Bel(AN B) 


and if the denominators involved in formulas (26) and (27) 
are strictly positive. 

Note that if condition C> is satisfied then the condition C; 
is also satisfied, but not necessarily the converse. 


Proof of SGBT: From GBT formula (20), we replace the terms 
by their expressions to obtain SGBT formulas (26)-(27). For 
notation convenience, we denote A; & A and Aj & A. Hence 
the GBT formula reduces to 


paiiey = 
Den 
_ Bel(B|A1)q(Au, B) 
Soi_, Bel(B\A;)q(Ai, B) + U((A1 9 B)*)’ 
where 


Num = Bel(B|A1)q(A1, B), 
Den = Bel(B|A1)q(A1, B) 


+ Bel(B|Az)q(A2, B) + U((A, N B)’), 
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and 


q(Ai, B) = Bel(A1) + U((BN A1)*) — U(B* 9 Aj) 
= Bel(A,) + PUB Ai) — Bel(BN Ai) —U(B* Nn Aj). 


U((BNA1)*) 
Because Fg«(m) = Foe(m) — Fp(m) — Fa~(m) one has 


U(B* 1 Ai) 


m(X) 
X EF px (m)|X EF a, (m) 
= a m(X) 
X €Fe(m)—Fe(m)—Fa(m)|X €F a, (m) 
aD m(X) 
X€Fe(m)|XE€Fa,(m) 
- » m(X) 
X€FpB(m)|XEFa,(m) 
- oS» m(X) 
X€FR(m)|XEFa,(m) 
= Bel(A;) — Bel(A, NB) 


Therefore 
q(Ai, B) = 


Bel(A1) + PUB Ai) — Bel(BN Ai) 


U((BNA1)*) 
— [Bel(A1) — Bel(A1 9 B) — Bel(A1 N B)| 
U(B*NA1) 
= PI(A, NB) + Bel(Ai NB). 
Similarly, one has 


q(A2, B) = Bel(A2) + U((BN Az)*) — U(B* Nn Az) 
= Bel(Az) + Pl(BN A2) — Bel(B A Ag) 


U((BNAz)*) 


— [Bel(A2) — Bel(A2 MB) — Bel(A2 N B)] 


U(B*N Ag) 
= Pl(A2M B) + Bel(A2 B). 
The value U((A, M B)") is given by 


U((A, 9 B)*) = Pl(A, B) — Bel(A, NB). 


Therefore the numerator and denominator of Bel(A|B) are 


Num © Bel(B|A1)q(A1, B) 
= Bel(B|A;)[Pl(A1N B) + Bel(AiN B)] 
= Bel(B|A)[Pl(AN B) + Bel(An B)}, 


Den © Bel(B|A1)q(A1, B) + Bel(B|A2)q(Az2, B) 
+U((A, n B)") 

= Bel(B|A,)[Pl(A1 N B) + Bel(A1 N B)] 

+ Bel(B|A2)[Pl(Az N B) + Bel(A2 2M B)] 
+ [Pl(A1 9 B) — Bel(Ai  B)] 

= Bel(B|A)[Pl(AN B) + Bel(AN B)| 

+ Bel(B|A)[PI(AN B) + Bel(AN B)| 

+ [PI(AN B) — Bel(AN B)]. 


Because Bel(B|A) = Bel(ANB)/[Bel(ANB)+ PlUANB)| 
and Bel(B|A) = Bel(AN B)/[Bel(AN B) + PUAN B)| 
based on FH formulas, after basic algebra one can verify that 
Num = Bel(AM B) and Den = Bel(AN B) + PI(ANB). 


Because Bel(B|A) = Bel(ANB)/[Bel(ANB)+Pl(A+B)], 
the term Bel(B|A)[PI(ANB) + Bel(AN B)] involved in Den 
equals Bel(AM B). Hence the expression of Den reduces to 


Den = Bel(B|A)[PI(AN B) + Bel(AN B)| 
+ Bel(B|A)[PI(AN B) + Bel(AN B)| 
Bel(ANB) 
+ [Pl(AN B) — Bel(AN B)| 
= Bel(B|A)[PI(AN B) + Bel(AN B)] + PUAN B). 


If Pl(B|A) = Pl(ANB)/[Pl(ANB)+Bel(ANB)| > 0 and 
if we multiply the expressions of Num and Den by Pl(B|A) 
one gets 


Num | Num: PI(B|A) 
Bel(A|B) = = ——_—___——_ 
aa) Den Den: PI(B\A) 
PUAMB) 
_ Num - p7qRB)+Bel(ANB) 
-_ Den PI(ANB) 


* PI(ANB)+Bel(ANB) 
7 Bel(B|A)PI(AN B) 
~ Bel(B\|A)PI(AN B) + PI(B|A)PI(AN B)’ 


which corresponds exactly to the SGBT formula (26). 


The SGBT formula (27) can also be obtained similarly from 
GBT by expressing at first Bel(A|B) = Bel(A2|B) as 


Bel(A|B) = Num’ = Bel(B|A2)q(A2, B) 
Den! — SYj_ Bel(B|Ai)q(Ai, B) + U((A2n B)*) 
= Bel(B|A2)q(A2, B) 
Bel(B\A1)q(A1, B) + Bel(B|A2)q(A2, B) + U((A2N B)*)’ 
where 
Num’ + Bel(B|Az)q(Az, B) 
= Bel(B|A)[Pl(AN B) + Bel(AN B)| 
= Bel(AN B), 
Den! = Bel(B|A1)q i B) + Bel(B\Az2)q(Az, B) 
+U((A2N B)") 
= Bel(B ae NB) + Bel(AN B)] 


+ Bel(B|A)[Pl(An B) + Bel(An B)] 
+ [Pl(AN B) — Bel(AN B)| 
= Bel(AN B) + PI(ANB). 


4Here U((A2 1 B)*) 
Bel(AN B) because Az 


FG B) — Bel(Ay NB) = PI(AN B) — 


699 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


If Bel(B|A) = Bel(ANB)/[Bel(ANB)+Pl(ANB)] > 0 
and if we multiply Num’ and Den’ by Bel(B|A) one gets® 


< Num’ Num’ - Bel(B\A) 
ee) Den’ Den’: Bel(B\A) 
_ Bel(AN B)Bel(B\A) 
~ Bel(AN B)Bel(B|A) + PI(AN B)Bel(B\A) 
Bel(ANM B)Bel(B\A) 
Bel(An B)Bel(B|A) + PU(B]A)Bel(AN B) 
Hence 


PI(A|B) = 1— Bel(A|B) 
_ Pl(B|A)Bel(AN B) 
~ PI(B\A)Bel(AN B) + Bel(B|A)Bel(AN B)’ 


which corresponds to SGBT formula (27). 

Therefore, one has proved that expression (26) can be 
obtained from GBT if Pl(AM B) > 0, and expression (27) 
can be obtained from GBT if Bel(ANB) > 0. This completes 
the proof of SGBT. 


Lemma 2: SGBT formulas (26) and (27) coincide with condi- 
tional probability formula P(A|B) = P(B|A)P(A)/P(B) = 
P(AN B)/P(B) if the belief function is Bayesian. 


Proof: Replacing Bel(-) and Pl(-) by P(-) in (26) and 


_ P(BIA)P(AOB) = 
(27) we get P(AIB) = poRayPCins)+PUB|APCanB) = 


P(B|A)P(AnNB) _ P(B|A)P(A) 
PEGE) P(ANB)+ 292) P(ANB) P(ANB)+P(ANB) 


eee ee because P(AM B) + P(AN B) = P(B). This 
éompletes the proof of lemma 2. 


In appendix we also prove that Bel(A|B) < Pl(A|B) when 
using SGBT formulas (26) and (27). 


VI. EXAMPLES 


In this section we give two simple interesting examples of 
application of SGBT. Example 1 shows that GBT and SGBT 
works fine because conditions C’; and C2 are satisfied, whereas 
the example 2 shows that GBT works fine but SGBT doesn’t 
work because of violation of condition C. 


A. Example 1 


We consider 0 = {2 1,2%2,23,24} and the BBA chosen as 
follows m(a1) = 0.05, m(x2) = 0.03, m(a1 Uae) = 0.02, 
m(x3) = 0.04, m(x4) = 0.06, m(x3 U v4) = 0.10, m(ae U 
x3) = 0.30 and m(x1U22Ux3Ua4) = m(O) = 0.40. We also 
consider the partition © = {A = {21, x2}, A= {x3, 4}} and 
the subset B = {x2, x73}. Hence one has 

B 


—— 
O= {@1, v2, ty, fa}. 
Y~YH 


A A 
with A = {21,22} = a U aa, A = {23,24} = 23 U 24, 
B= {x2,03} = x2 Ua, and B= {x,, a4} =a, U 24. 


5From FH formulas Pl(A M B)Bel(B|A) = Pl(B\|A)Bel(AN B). 


The set of focal elements in this example is 


Fo(m) = {@1, 2,01 U%2,%3, 04,03 U £4, 


vU @3,21U%2U 23 U a4}. 


The sets of focal elements included in A and in A are 
Fa(m) = {@1, 2,041 U 22}, and Fa(m) = {x3,%4,03U x4}, 
and one has F4.(m) = Fe(m) — Fa(m) — Fqa(m) = {x2 U 
3,0 Uxr2Ua3U x4}. The sets of focal elements included in B 
and in B are Fg(m) = {x2, 13, 22U a3}, Fa(m) = {21,24}, 
and one has Fpg.(m) = Fe(m) — Fa(m) — Fa(m) = 
{a1 Ure, %3U%4, 01 UtgU a3 U4}. From the BBA m(-) we 
get the following belief and plausibility values listed in Table 
I which are useful for making derivations of FH, GBT and 
SGBT formulas. 


Subsets oF © PI 


Baldy = 0.20 
Bel(A) = 0.37 
Bel(B) =0.11 


Bel(AN B) = 0.05 
Bel(AM B) = 0.04 
Bel(AN B) = 0.06 
Table I 
BELIEF AND PLAUSIBILITY VALUES USED FOR THE DERIVATIONS. 


e Application of FH formulas: with (13)-(14) one gets 


Bel(AN B) 0.03 
Bel( A|B) = ——___—__+___. = ___"__ 
el(AlB) = Baan B)+PUANB) ~ 0.034084 
= 0.03448275, 
PI(A|B) = PI(ANB) (0.75 
PI(AN B)+Bel(ANB)  0.75+0.04 
= 0.94936708. 
With FH formulas (15)-(16), one gets 
Bel(AN B) 0.03 
Bel(B|A) = —————______ = __—*~__ 
el(BIA) = Saan B) +PUBNA) ~ 0.034047 
= 0.06, 
PI(AN B) 0.75 
PI(B|A) = ——————_—~____ = __——__ 
ae) PI(AN B)+Bel(BN A) 0.75 +0.05 
= 0.9375. 


e Application of SGBT formulas: with (26) and (27) one gets® 


Bel(B|A)PU(AN B) 
Bel(B|A)PU(AN B) + Pl(B|A)PI(AN B) 
_ Bel(B|A) PUAN B) 
~ Bel(B\|A)PI(AN B) + [1 — Bel(B|A)]PU(AN B) 
_ 0.06 - 0.47 _ 0.0282 
~ 0.06 - 0.47 + [1 — 0.06]0.84 ~ 0.0282 + 0.7896 
= 0.03448275, 


Bel(A|B) = 


®Tt is worth noting that conditions Cj and C2 are satisfied in this example 
because PI(A MB) = 0.47 and Bel(AN B) = 0.05. 
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PI(B|A) Bel(AN B) 


Pl(A|B) = ——————_" The value U((A, 9 B)") involved in the denominator of 
Bel(B| A) Bel(AN B) + PI(B|A)Bel(AN B) Bel(A|B) expression is given by 
= PI(B\A)Bel(AN B) _ : 7 2 
~ [i — PU(BJA)|Bel(An B) + PU(B|A)Bel(An B) U((A1 1B) ) = PI(AN B) — Bel(An B) 
0.9375 - 0.05 = 0.84 — 0.04 = 0.80. 
ee FOE Ogata 0) Replacing all these values in GBT formula of Bel(A|B) we 
= eee = 0.94936708 et 
~ 0.0025 +0.046875 ? 
e Application of GBT formula (20): we denote Ay = A = Bel(A|B) = Bel(A;|B) 
a, Ua, and Ag = A = x3 U x4. Here GBT formula (20) _ Bel(B|A1)q(A1, B) 
becomes Sy_, Bel(B|A;)q(Ai, B) + U((A1N B)*) 
BABS Bel(B|A1)q(A1, B) 7 0.06 - 0.50 
e = —, ee o-oo TFTFTTTllEeEEeee  OOleeemmm:?:?: 
v2, Bel(B\A;)q(A;, B) + U((A1 n BY’) 0.06 - 0.50 + 0.06666667 - 0.60 + 0.80 
where Bel(B|A,) and Bel(B|Az) terms are given by ~ 0.870000002 ~ 0.03448275. 
Bel(B|A,) = Bel(B|A) = Bel(An B)_ As shown, Bel(A|B) calculated by GBT and by SGBT 
Bel(AN B)+ PI(BN A) formulas are consistent with the value calculated directly from 
_ 0.03 0.06 FH formulas. For calculating Pl(A|B), we calculate at first 
~ 0.03+0.47 Bel(A|B) = Bel(A2|B) and then PI(A|B) = 1— Bel(A|B). 
7 A Applying GBT formula for calculating Bel(A2|B), one has 
Bel(B|Ap) = Bel(B|A) = Baan) Pere 5 Pellna) 


Bel(AN B) + PI(Bn A) Bel(A|B) = Bel(Ao|B) 


= 0.06666667. 7 Bel(B|A2)q(Az, B) 
So7_, Bel(B|A;)q(Ai, B) + U((A2 0 B)*) 


The values of Bel(B|A;), q(Ai,B) for i = 1,2 have been 
calculated previously and U((A2  B)") is given by 


0.04 
~ 0.04 + 0.56 


The terms q(A,, B) and q(Az, B) are given by 
q(A1, B) = Bel(A,) + U((BN Ai) ’) — U(B* 9 Ai) 
= 0.10 + 0.42 — 0.02 = 0.50, 
q( Az, B) = Bel(Az) + U((B NM Az)*) — U(B* 2 Az) 
= 0.20 + 0.50 — 0.10 = 0.60, 


U((4gB)") = Pl(A2N B) — Bel(A2 NB) 
= PI(AN B) — Bel(AN B) 
=i), 


75 — 0.03 = 0.72. 


because 
_ és a : Therefore, 
U((BN A,)*) = Pl(BN Ay) — Bel(BN Aj) - 
= Pl(BN A) — Bel(BN A) Bel(A|B) = Bel(A2|B) 
= 0.47 — 0.05 = 0.42, _____—Bel{BlAa)q{Aa,B) 
. _ y-_, Bel(B|Ai)q(Ai, B) + U((A2 0 B)*) 
ee aS a my) 0.06666667 - 0.60 
X€F px (m)|X€Fa,(m) SS — 
> os 0.06 - 0.50 + 0.06666667 - 0.60 + 0.72 
= m(X 0.040000002 
Gorn eee *~ Sano00003 © 0-08068292. 


= m(x1 U x2) = 0.02, and finally we get 


and PI(A|B) = 1 — Bel(A|B) © 0.94936708. 


EB OLS p= Aa) Pe Aa) From this very simple example we have verified that FH 


= PUB A) — Bel(BN A) formulas, GBT formula and simplified GBT formula are all 
= 0.56 — 0.06 = 0.50, consistent because the conditions C;, and C> are satisfied. 
ae 2 mi) B. Example 2 
X€F px (m)|X€Fay(m) 
_ > m(X) We consider the example of [8] (Section VIII). We verify 
that SGBT formula (26) works because Bel(B|A1) = 0.0889, 


PoP NAST A) PU(A, B) = 0.41, Pi(B|Ai) = 1— Bel(BJA,) = 1- 


m(wx3 U x4) = 0.10. 0.0889 = 0.9111 and Pl(A, N B) = 0.54 so that 
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Bel(B|A1)PUAi ial B) 
Bel(B\A1) PU(A, B) + Pl(B\A1) PUA 2 B) 
_ 0.0889 - 0.41 
0.0889 - 0.41 + 0.9111 - 0.54 


Bel(A,|B) = 


= 0.0690. 


which is the same value of what we get by applying directly 
FH formula, or GBT formula (20). The SGBT formula (26) 
works because the condition C; (i.e. Pl(A,N B) = 0.41 > 0) 
is satisfied. Similarly, using (26), one has for Bel(A2|B) 


Bel(B|Az)PI(A2 ia B) 
Bel(B|Ag)PlU(Az ON B) + Pl(B|A2)PU(A2 9 B) 
0- 0.43 


= ——___""__ =9, 
0-0.43 + 1-0.80 


Bel(Ao|B) = 


which is the same value of what we get by applying directly 
FH formula, or GBT formula (20). Here SGBT formula (26) 
works because the condition C; (i.e. Pl(A2 NB) = 0.43 > 0) 
is satisfied. 


For the value Bel(A3|B) = 0.0625 computed by FH 
conditioning formula, or by GBT formula (20) things are 
different because when applying SGBT formula (26) we get 
0/0 indetermination. Indeed, 


Bel(B|A3)Pl(A3 N B) 
Bel(B\A3) Pl(A3 9 B) + Pl(B|A3)PUA3 9 B) 
7 1-0 _ 0 
~ 1040-075 0° 


Bel(A3|B) = 


So one sees that SGBT formula (26) does not work for 
computing Bel(A3|B) in this case because the condition C1 
(ie. P1(A3 MB) > 0) is not satisfied which is normal. In this 
case the correct value Bel(A3|B) = 0.0625 must be calculated 
by GBT or FH formulas. 

Therefore in practice a special attention must always be paid 
to conditions C and C2 before applying SGBT formulas, and 
in case of violation of one of these conditions, one needs to 
work back directly with FH or GBT formulas. 


VII. CONCLUSION 


The main contribution of this paper is the derivation of a 
simplified formulation of Generalized Bayes’ Theorem, called 
SGBT, which extends Bayesian Theorem in the frame of 
belief functions. The simplification is imposed from the fact 
that the general formulation of GBT is not easy to apply in 
real world applications. It is drawn from GBT for working 
with a dichotomous partitioning of the frame of discernment. 
The conditions under which this new formulation is valid 
are presented. The theoretical results obtained are illustrated 
with simple theoretical examples. The challenging question of 
application of GBT and SGBT to solve real-world problems 
is under investigation. 


APPENDIX 


A. Proof that Bel(A|B) < Pl(A|B) from SGBT formula 


To prove that Bel(A|B) < PI(A|B) from SGBT formulas 
(26)-(27) one needs to prove the following inequality 


Bel(B|A)PIU(AN B) Z 
Bel(B|A)PUAN B) + PUB|A)PUAN B) — 
Pl(B|A)Bel(AN B) 
Bel(B|A)Bel(AN B) + Pl(B|A)Bel(AN B)’ 


After basic algebraic manipulations on the previous in- 
equality, one has to prove if Ry < Ry - R3 - Ry. where, 
for the notation convenience, Ry = Bel(B|A)/PI(B|A), 
Rp = Bel(AN B)/PI(AN B), R3 = Pl(B|A)/Bel(B\A) 
and Ry = Pl(AN B)/Bel(AN B). Our proof is done by 
contradiction as follows. 

Let us assume that R2-R3-R4 < R, is valid, that is 


Bel(ANB) PUB\|A) PlUANB) | Bel(B\A) 


a ame Pp cE A pt ———— (28 
PIANB) Bel(B|A) Bel(ANB) © Pipa) °® 
R 
Re R3 R4 1 


Because Rz < 1, one has necessarily Ro-R3-Ra < R3- Ra, 
so we must have (if our assumption is valid) R3- Ra < Ri, 
that is 


1—Bel(B|A) Pl(ANB) | Bel(BIA) 


Sep ene ie eee 29 
1—PI(B|A) Bel(ANB) ~ PI(B\A)’ (29) 
ee eee Nee eee 
Rz3 Ra Ri 
or equivalently 

Bel(ANB) 
[1 — Bel(B|A)]Pl(B|A) < Bel(.B|A)[1 — Pl(B|A)] PUAN B) ’ 
—— 


1/R4 
Because Bel(AN B)/PI(AM B) < 1 then 


Bel(AN B) 
PUAN B) 
_————— 


1/R4 


Bel(B|A)[1 — Pl(B|A)] < Bel(B|A)[1— Pl(B|A)]. 


So we must have (if our assumption is valid) 
[1 — Bel(B|A)|Pl(B|A) < Bel(B|A)[1 — Pl(B|A)], (0) 


which is (after rearranging terms) equivalent to have the 
inequality Pl(B|A) < Bel(B|A) satisfied. However, from 
Fagin-Halpern definitions of conditional belief function 
and properties of belief functions the previous inequality 
PI(B\|A) < Bel(B\A) is never satisfied. Therefore our 
assumption Ra - R3 - Ry < Ry, is not valid and one has 
necessarily Ry < Rz-R3 - R4, which completes the proof 
that Bel(A|B) < PI(A|B) when Bel(A|B) and Pl(A|B) 
are calculated by the SGBT formulas (26) and (27). 
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Abstract—Image denoising is a fundamental problem in image 
processing. The switching filtering is a popular approach to 
reduce the impulse noise. It faces two challenges including the 
impulse noise detection and filter design. The traditional detection 
methods based on single criterion or multiple criteria encounter 
uncertainty problems and produce many miss-detections and 
false alarms, especially when the image is severely corrupted. 
In this paper, the uncertainties encountered in the impulse noise 
detection are ad- dressed using the theory of belief functions, and 
a multi-criteria detection strategy for the impulse noise based on 
evidential reasoning is proposed. Based on the pre-detection, an 
adaptive median filter is de- signed, which adaptively determines 
the size of the filtering window according to the estimated global 
noise density and the degree of local corruption. Experimental 
results and related analyses show that our proposed image 
denoising method for the impulse noise has superior performance 
compared with several state-of-the-art denoising methods. 


Keywords: image denoising, impulse noise, multi-criteria 
detection, evidential reasoning, adaptive median filtering. 


I. INTRODUCTION 


Digital images can be corrupted by various types of noise 
during the image acquisition and transmission. The impulse 
noise is one of the most common types, which is encountered 
in cases with quick transients, e.g., faulty switching during 
imaging [1]. The intensity of a pixel corrupted by the impulse 
noise tends to be much higher or lower than those of its uncor- 
rupted neighbors. The impulse noise dramatically influences 
the image quality and makes images unsuitable for subsequent 
human understanding or image processing such as the edge 
detection [2], segmentation [3], object recognition [4], image 
analysis [5] and image understanding [6]. 

Till now, the impulse noise reduction problem has not been 
well solved and has attracted extensive research interests. The 
median filtering is the most popular approaches for the impulse 
noise reduction. The standard median (SM) filter [7] replaces 
the target pixel’s intensity by the median of intensities of 
its neighbors. Various modifications of the SM filter have 
been proposed, such as the weighted median (WM) filter [8] 
and the center weighted median (CWM) filter [9]. However, 
all these filters apply the median operations to each pixel 
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ignoring whether the target pixel is corrupted or not. This 
might destroy the details contributed from uncorrupted pixels 
and lead to image quality degradation. To deal with this 
problem, switching median filters [10] were proposed, which 
introduce the noise detection prior to the filtering. Since only 
the corrupted pixels will be filtered and the uncorrupted pixels 
remain intact, more details can be preserved and better filtering 
performance can be achieved if the pre-detection result is 
accurate enough. 

In recent years, sparse representation (SR) [11] is widely 
used in image denoising [12], [13], [14], especially for 
Gaussian noise. For the impulse noise, the noise detector 
is incorporated into SR model and the weighted dictionary 
learning method was proposed for impulse noise denoising 
[15], [16], [17]. Both median filtering and SR based method 
face the challenge of noise detector designing. 

There have emerged two major criteria for the impulse noise 
detection including the extreme property and discontinuity 
property. Some detectors only use a single criterion, which 
may involve some uncertainty problems. For example, the 
boundary discriminative noise detection (BDND) [18] and the 
efficient improvements on the BDND (IBDND) [19] use the 
criterion of extreme property. Both algorithms label a pixel 
as the noise if it is assigned to the low-intensity range or 
high-intensity range according to the histogram distribution in 
a local window centered at that given pixel. However, these 
detectors easily lead to false alarms since not all the pixels 
with low-intensity or high-intensity are noise. There are other 
detectors that only use the criterion of discontinuity property. 
Such detectors can be found in the adaptive impulse detection 
using center-weighted median (ACWM) filter [20], directional 
weighted median (DWM) filter [21], contrast enhancement- 
based (CEF) filter [22], adaptive switching median (ASWM) 
filter [23], weighted couple sparse representation (WCSR) 
model [24] and the denoising framework combining the detec- 
tion mechanism based on the robust outlyingness ratio with the 
NL-means (ROR-NLM) [25]. They label a pixel as the noise if 
its similarity with its neighbors is lower than a preset threshold. 
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However, when the noise density is high, the impulse noise 
pixels might not show the discontinuity property since there 
are too many noise pixels in their neighbors. Since pixels 
with extreme or discontinuity property may not always be the 
noise, the detectors based on a single criterion will involve 
uncertainty problems and tend to yield incorrect detection 
results. Both criteria have their own rationalities; however, 
they are one-sided. It should be better to jointly use them when 
detecting the impulse noise. Therefore, some approaches using 
the above two criteria jointly have been proposed, e.g., the 
detector in noise adaptive soft-switching median (NASM) filter 
[26] and the detector based on the cloud model (CM) [27]. 
These two-step detection methods first recognize the suspected 
noise pixels using the extreme criterion, and then distinguish 
the noise pixels from the suspected noise pixels using the 
discontinuity criterion. However, they can easily produce miss- 
detections when some noise pixels are not detected as the 
suspected noise in the first step. Therefore, the two-step type 
joint use is not preferred. 

To deal with the uncertainties encountered in the impulse 
noise detection and avoid the drawbacks of the two-step- 
type joint use of detection criteria, in this paper, a new 
detection approach for the impulse noise is proposed, which 
uses the two criteria simultaneously based on the theory of 
belief functions [28]. In our detection approach, the extreme 
property is described using the interval data distance between 
the target pixel’s intensity and the intensity range of the 
whole noisy image (expressed as an interval number). The 
discontinuity property is described using the rank-ordered 
absolute differences (ROAD) statistic [29]. The uncertainty 
problem encountered in the impulse noise detection, e.g., 
pixels with extreme or discontinuity property may not always 
be the noise, are modeled by belief functions and are further 
handled through the evidence combination. 

The impulse noise detector implementation is the main 
work of this paper. Based on the detection result, an adaptive 
median filter is designed, which adaptively determines the size 
of filtering window according to the estimated global noise 
density and local corrupted degree. Experimental results show 
that our proposed adaptive switching median filter with pre- 
detection based on evidential reasoning (ASMF-DBER) has 
superior performance compared with several state-of-the-air 
switch median filters and the SR based method. 


II. BASIS OF IMPULSE NOISE AND UNCERTAINTY 
PROBLEMS ENCOUNTERED IN IMPULSE NOISE DETECTION 


A. Impulse noise model 


When an image is corrupted by the impulse noise, some 
pixels are changed and their intensities are extremely high 
or extremely low. We use the same impulse noise model 
as used in BDND [18]. Assume that the noise pixels take 
values in two fixed sets S; = {0,1,...,a} and Sg = 
{255 — a, 255 — (a —1),..., 255} for an 8-bit monochrome 
image. Let s;; and x;,; be the pixels’ intensities at location 
(i,7) in the original and noisy images, respectively. Let n;,; 
be the noise which is independent of s;,; and corresponds to a 


random value uniformly distributed in the set S; and $2. Let p 
denote the probability that a pixel is corrupted. The probability 
mass function (pmf) [30] of x;; is given by 


Aw. P; for Lig = Nig, 
P(2;,;) _ { 1 =p, for Lig = Siz. 


(1) 


Specially, if a = 0, the intensities of noise pixels can only 
take the two extreme values 0 or 255. This type of impulse 
noise is also called the salt-and-pepper noise. Since n;,; is 
independent of s;,;, it is possible that n;,; = s;;. This kind 
of pixel should be regarded as uncorrupted. 


B. Uncertainties encountered in impulse noise detection 
The impulse noise has two properties: 


1) Extreme property: The intensity of an impulse noise 
pixel is usually an extreme value (0 or 255) or close 
to an extreme value. 

2) Discontinuity property: The intensity of an impulse noise 
pixel tends to be much higher or lower than those of its 
neighbors. 


These two properties are often used as detection criteria for 
the impulse noise. Some detectors only use one of the criteria: 


1) Detectors based on the criterion of extreme property: 
These detectors label a pixel as the noise, if it is 
assigned to the low-intensity range or high-intensity 
range according to the histogram distribution in a local 
window centered at that pixel, e.g, BDND [18] and 
IBDND [19] detectors. 

2) Detectors based on the criterion of discontinuity prop- 
erty: These detectors label a pixel as the noise, if its 
dissimilarity with its neighbors is larger than a preset 
threshold, such as ACWM [20], DWM [21], CEF [22], 
ASWM [23], ROR-NLM [25] and WCSR [24]. 


However, such single criterion based detectors may involve 
the following uncertainty problems: 


1) Uncertainty in extreme criterion: Some signal pixels may 
also be detected as the noise, since their intensities are 
very close to extreme values, e.g., some edge pixels and 
texture pixels. Moreover, in some bright or dark area, 
the intensity range of signal pixels may overlap with that 
of the impulse noise pixels. Therefore, when using the 
extreme criterion alone, it is uncertain to judge those 
signal pixels with extreme property to be the impulse 
noise or not. 

2) Uncertainty in discontinuity criterion: The discontinuity 
property of the impulse noise pixels becomes weaker 
with the increase of noise density since there are many 
noise pixels in their neighbors. At the same time, some 
signal pixels may show discontinuity. Therefore, with 
only the discontinuity criterion, it is uncertain to judge 
a pixel to be the impulse noise or not. 


Due to these uncertainties, the single criterion based detec- 
tors are to some extent one-sided and tend to yield incorrect 
detection results. Hence, it should be better to jointly use the 
two criteria to implement a more comprehensive detection. 
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Some two-step detection methods, like NASM [26] and 
CM [27], jointly use these two criteria in two consecutive 
steps. They first recognize suspected noise pixels according 
to the extreme criterion, and then distinguish noise pixels 
from suspected noise pixels according to the discontinuity 
criterion, as illustrated in Fig. 1. In the first step, only 
using the extreme criterion, some noise pixels may not be 
detected as the suspected noise and therefore are miss-detected 
straightly. These miss-detected pixels will not undergo the 
filtering so that these two-step methods can easily lead to 
poor noise-reduction capabilities. Therefore, when detecting 
the impulse noise, it should be better to use these two criteria 
simultaneously, but not in two consecutive steps. 


Extreme Discontinuity 
] criterion criterion 
Corrupted image —————» Suspected noise pixels ——————> Noise pixels 


Figure 1. The two-step detection method. 


To deal with these uncertainties encountered in the single 
criterion based detections and to implement a more compre- 
hensive detection by using these two criteria simultaneously, 
we propose an evidential reasoning based impulse noise de- 
tection approach thanks to the ability of belief functions to 
model uncertainty and for reasoning under uncertainty. The 
theory of belief functions [28] are briefly recalled first below. 


III. IMPULSE NOISE DETECTION BASED ON EVIDENTIAL 
REASONING 


A. Basic of evidence theory 


The theory of belief functions, also called Dempster-Shafer 
evidence theory (DST) [28], is a theoretical framework for 
uncertainty modeling and reasoning. 

In DST, elements in the frame of discernment (FOD) 
© = {61,02,...,0;} are mutually exclusive and exhaustive. 
The power set 2° of the FOD 0 is the set of all subsets of 
©. Define a function m from 2° to [0,1] as a basic belief 
assignment (BBA, also called a mass function) satisfying 


pee m(A) = 1, m(Q) =0. (2) 


m(A) depicts the evidence support to the proposition A. If 
m(A) > 0, A is called a focal element. 
The plausibility function (P/) and belief function (Bel) are 


defined as: 
PI(A) = are m(B), (3) 


Bel(A) = a m(B). (4) 


The belief interval [28], [31] [Bel(A), Pl(A)] represents the 
imprecision of the support to the proposition A. 

Dempster’s rule of combination [28], which is used for 
combining two distinct pieces of evidence, is defined as 


0, A=6, 
(mi 6m2)(A)=4 24. YD mi(B)m2(C), AD, 
BOC=A (5) 


where K = >) pqc—p ™1(B)m2(C) represents the total con- 
flict or contradictory mass assignments. 

For a probabilistic decision-making, Smets defined the pig- 
nistic probability transformation [32] to transform a BBA into 
a probability measure BetP: 


A m(A 
Beth) = Dies a“. 


where |A| denotes the cardinality of A. The decision is made 
by choosing the element in FOD which has the highest BetP 
value. Note that there are still other probability transformations 
of BBA, see [33] for details. 


V0; € O, (6) 


B. Evidential modeling for uncertainties and fusion based 
detection 


To deal with the uncertainties encountered in the impulse 
noise detection, we propose a detection method based on 
evidential reasoning, which uses the extreme criterion and 
discontinuity criterion simultaneously. The flow chart of the 
detection algorithm is illustrated in Fig. 2. 


Here we propose two methods for uncertainties modeling. 
One proposed method models the uncertainties of extreme cri- 
terion and discontinuity criterion with two BBAs, respectively 
(denoted by method I). The other proposed method treats this 
impulse noise detection with two criteria as a multi-criteria 
decision making problem, and uses cautious ordered weighted 
averaging with evidential reasoning (COWA-ER) method [34] 
to generate BBAs (denoted by method IJ). In both methods, 
we use the distance of interval numbers to describe the 
extreme property and use the rank-ordered absolute differences 
(ROAD) statistic [29] to describe the discontinuity property. 


B-I Evidential modeling method I and fusion based detec- 
tion 


1) Evidential modeling for the uncertainty in extreme crite- 
rion 

According to the extreme property of the impulse noise, the 
intensity of an impulse noise pixel must be an extreme value 
or close to an extreme value. 

Since all of the pixels’ intensities in an image are within a 
range ((0, 255] for an 8-bit image), the intensity information of 
an image can be represented by an interval number. An interval 
number a in R is a set of real numbers that lie between two real 
numbers, i.e., @ = [a1, a2] = {zla, < @ < ag}, aji,a2 ER 
and a; < dg. The intensity information of an image can be 
expressed as an interval number [= [Imin, fmax|, Where Inin 
and Imax denote the minimum and maximum intensities of the 
image, respectively. Furthermore, a single pixel’s intensity x 
can also be viewed as an interval number [x, 7], whose upper 
bound and lower bound are equal. 

The distance of interval numbers is a measure of dissimilar- 
ity between interval numbers. Here we use it to describe the 
closeness between a pixel’s intensity and the extreme values. 
Various types of distance for interval numbers [35] have been 
proposed. Here, we use the following strict distance metric. 


707 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Extreme 
criterion Uncertainty Combine BBAs Transfer the Target pixel is 
modeling > using the > combined BBA ;~» detected as noise 
Discontinuity Ly) (generate BBAs) combination rule into BetP if BetP(N) = 0.5 
criterion 


Figure 2. Noise detection algorithm based on evidential reasoning. 


Given two interval numbers @ = [a1, a2] and b = [b1, be], the 
distance between a and 6 is defined as: 


ss aaas 2 a@2—a1)—(b2— A 
dG, 6) = y [Ste Ray + gfe.) 


According to (7), the distance between J and [x, z], which 
describes the closeness between a pixel’s intensity x and the 
extreme values, [min and Imax, 1S: 


2 ae 
( Sut toaas _ x) 4 Grex fein) . (8) 


Here, «x takes values in the interval [Iwin, Jax]. An illustration 
of d(I, {x,a]) is shown in Fig. 3, where the intensity range 
of the image is set as [0,255]. When x takes the median of 
[0,255], ie., 127 or 128, d(I, [x,2]) reaches the minimum 
value. The closer between x and the extreme value, e.g., 0 or 
255, the higher the value of d(J, [a, a]). Thus, d(I, [x, 2]) can 
be used to describe the extreme property. 


130 


d(I,[x,x]) 


100} 


Figure 3. An illustration of d(J, [x, 2]). 


Suppose that the only possible type of the noise existing in 
the given image is the impulse noise. The pixel whose intensity 
is far from the extreme values must be the signal, but the pixel 
whose intensity is close to an extreme value may not be the 
noise. For example, in some bright or dark area, the intensity 
range of signal pixels may overlap with that of the impulse 
noise pixels. Here, we use the belief function to describe this 
uncertainty. 


We set a detection window with a size of wp x wp centered 
at the given pixel at (i, 7): 


(wo — 1) <s,t< eee 


Wo(i,j) = 125-5. 4—4| — a) ae = : 
(9) 


where x;_5,;—1 is the intensity of the pixel at (i — s,j —t). 


We focus on two distances in Wp: 


1) d, denotes the distance between the center pixel’s inten- 
sity and the interval number I, where [ expresses the 
intensity range of the pixels in the whole image. 

2) do denotes the minimum distance in Wp between a 
pixel’s intensity and I, where the pixel is the one whose 
intensity is the farthest one in Wp from the extreme 
values. Thus, this pixel is most unlikely to be the noise 
in Wp according to the extreme criterion. 


We also focus on another two distances in the whole image: 


1) dex denotes distance between T and the extreme value: 
Imin OF Imax. It is the maximum distance in the image 
between a pixel’s intensity and I. Ifa pixel’s intensity 
is close to the extreme value, its distance to T is close 
to dase: 

2) dmea denotes distance between T and the median of J. It 
is the minimum possible distance in the image between 
a pixel’s intensity and I. Ifa pixel’s intensity is much 
far from the extreme values, its distance to T is close to 
dmed: 


Finally, we construct a BBA ™, using the above distances to 
model the uncertainty of whether the center pixel is corrupted 
by the impulse noise or not according to the extreme criterion: 


m(N) = 7S, 
mi(S — 71 — ace=dmed (10) 


Here, the FOD 0 = {N,S'}, where N denotes the noise and 
S denotes the signal. The parameter ¢ is a small positive real 
number to avoid m (NV) to be 1, when the intensity of the 
center pixel is an extreme value. It means that a pixel with 
an extreme value should not be absolutely recognized as the 
noise because it might be the signal actually. Furthermore, 
Dempster’s rule of combination in (5) has the problem of one 
ballot veto when one BBA is assigned 1 on one singleton 6; 
(8; € ©), while 0 on other singletons. That is, if m (NV) = 1, 
Le., m(S) = 0, no matter what ma is, the combined BBA 
has m(S) = 0, which indicates the center pixel cannot be the 
signal. 

Since dext = de = do = dmed; 0 < (de = do)/(dext = do + 
e) < land 0 < (d. — dinea)/(dext — dmea) < 1. That is, 
0<mi(N) < 1 and 0 < mi(S) < 1. Besides, as dea < do, 


de = do de _ dmed 
N) <<. < 
mal ) = dext _ do - dext _ deat 
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Figure 4. Case a and the corresponding ™1 of the center pixel. (a) A detection window for case a. (b) Corresponding m1 when « takes different values. 
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Figure 5. Case 6 and the corresponding mz of the center pixel. (a) A detection window for case b. (b) Corresponding m1 when = takes different values. 


which means m (NV) < 1—m (9) so that my(N)+m1(S) < 
1. Therefore, mj, satisfies the constraint in (2), and m, is a 
legitimate BBA. 

According to ™ 1, the center pixel will have a large value of 
m,(N) only when its intensity is close to the extreme value, 
and at the same time far from the intensity being the closest 
to the median of [ in Wp. The center pixel will have a large 
value of ™(.S') when its intensity is close to the median of I. 

Here, we consider two different cases about the detec- 
tion window and the corresponding BBA m, of the cen- 
ter pixel when / = [0,255], ¢ = 0.1 and a = 10, ie., noise 
pixels take values in the sets of S; = {0,1,...,10} and 
So = {245,246,...,255}. 

1) For the most common case, the intensities of signal 
pixels in a detection window Wp are far from the 
extreme values, as the example shown in Fig. 4(a), where 
the intensity range of signal pixels is [100,150]. The 
pixel with the intensity of 127 is the most unlikely to 
be the noise in Wp, since 127 is the farthest intensity 
in Wp from the extreme values. If the center pixel’s 
intensity x;,; is in the range of [0, 10] or [245, 255], it 
is close to the extreme value, and at the same time far 
from 127. Thus, the center pixel is assigned a large value 
of m (JV). If the center pixel’s intensity is in the range 
of [100,150], it is close to the median of I. Thus, it is 
assigned a large value of ™m4(S). 

2) In some cases, all of the signal pixels’ intensities in a 
detection window are close to an extreme value, as the 
example shown in Fig. 5(a), where the intensity range of 
signal pixels is [230, 250]. The pixel with the intensity 
of 230 is the one that is most unlikely to be the noise 
in Wp, since 230 is the farthest intensity in Wp from 


the extreme values. If the center pixel’s intensity 2;,; 
is in the range of [0,10] or [241,255], it is close to 
the extreme value and far from 230. Thus, the center 
pixel is assigned a large value of m (NV). If the center 
pixel’s intensity z,;,; is in the range of [230,240], the 
center pixel is assigned a small value of m4(JV) since its 
intensity is close to 230. At the same time, it is assigned 
a small value of m1(S) since x;,; is far from the median 
of I. Therefore, m ,(Q) is large, which indicates it is 
hard to say whether the center pixel is the noise or signal. 
In summary, in this case, it is hard to get a crisp 
description of the beliefs for the corresponding decisions 
according to the extreme criterion, since the intensity 
range of signal pixels overlaps with that of the noise pix- 
els. However, our BBA ™, keeps the large uncertainty 
(large ™m1(©)), which is helpful to avoid the arbitrary 
detection decision. 


From the above we can see that when using the extreme 
criterion for noise detection, our evidential method uses the 
BBA to describe the beliefs of the corresponding detection 
decisions, which does not make a hard decision like two- 
steps methods but keeps the uncertainty. This is more cautious 
and can reduce the information loss for the final fusion-based 
detection. 


2) Evidential modeling for the uncertainty in discontinuity 
criterion 

According to the discontinuity property of the impulse 
noise, the intensity of an impulse noise pixel tends to be 
much higher or lower than the intensities of its neighbors. 
For some signal pixels, e.g., edge pixels and signal pixels in 
the bright or dark area, they are easily detected as the noise 
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Figure 6. One-dimensional illustration of the differences between some signal pixels and the impulse noise. (a) Intensities of the impulse noise and its 
neighbors. (b) Intensities of an edge pixel and its neighbors. (c) Intensities of signal pixels in the bright area. 


according to the extreme criterion. However, they have some 
differences from the noise pixel according to the discontinuity 
criterion. The intensity of an edge pixel is higher or lower than 
only a portion of the intensities of its neighbors. The intensity 
of a signal pixel in the bright or dark area is similar to the 
intensities of its neighbors. 

For a pixel at (i,j), we consider its neighbors’ inten- 
sity information in the same detection window Wp(i,j) = 
{ti-s,j;-t| — (wp — 1)/2 < s,t < (wp — 1)/2} as used 
in modeling the uncertainty in the extreme criterion. Those 
differences between the signal, edge, and the bright area pixels 
reflected in the discontinuity property are illustrated in Fig. 6. 
Here for simplification, one-dimensional expressions of the 
pixels’ intensities in the detection windows are used, where 
gq = (wp — 1)/2. 

We use the rank-ordered absolute differences (ROAD) statis- 
tic [29] to describe such differences reflected in the discon- 
tinuity property. Define dif (xj, @i-s,j;-t) = Vi,j — Vi-s,j—-t 
as the absolute difference of the intensities between the center 
pixel at (i, 7) and its neighbor at (i—s, jt), where x;_5,;-4 € 
Wop(i, 7). If the size of Wp (i, 7) is M = wp x wp, there will 
be M — 1 neighbors in the window, and therefore the amount 
of dif (xi,;,-) is M —1. These dif(x;,;,-) can describe the 
dissimilarity between the center pixel and its neighbors. To 
further analyze this dissimilarity, sort these M —1 dif (x:,;,-) 
values in the ascending order and denote r,(x;,;) as the gth 
smallest di f(2;,;,-). Finally, calculate the sum of the first n 
smallest dif (x;,;,-) as 


n 


ROAD, Gig) rates), 


g=1 


(11) 


where 2<n<M-—1. 

If the center pixel is the noise, dif (x;,;,-) is small when its 
neighbor is a noise pixel whose intensity is close to the same 
extreme value as the center pixel. 

If the center pixel is the signal without extreme property, 
dif (x;,;,-) is large when its neighbor is the noise. 

If the center pixel is the signal with extreme property, 
dif(x;,;,-) is large when its neighbor is a noise whose 
intensity is close to the other extreme value. 

Therefore, when the noise density is low, the impulse noise 
has large value of ROAD js—1(2;,;) as well as the sum of 
its smallest n dif(x;,;,-) values, i.e., ROAD, (x;,;). The 


signal pixel has small value of ROAD s-1(2;,;) as well as 
ROAD, (zi,;). 

With the increase of the noise density, the quantity of 
impulse noise pixels increases in a detection window. If the 
center pixel is the noise, the amount of small dif (2x;,;,-) 
will increase since there are more noise neighbors having the 
similar intensities with the center pixel. At the same time, the 
amount of very large dif(a;,;,-) also increases since there 
are more noise neighbors having the intensities close to the 
other extreme value. Thus, ROAD, (ai, ;) becomes smaller but 
ROAD yy-1(2:,;) has no significant change. 

If the center pixel is the signal, the amount of large 
dif (x;,;,-) will increase since there are more noise neighbors. 
Thus, ROAD as—1(2;,;) becomes larger but ROAD,, (;,;) has 
no significant change. 

In summary, ROAD,,(2;,;) is large only when the cen- 
ter pixel is the noise and the noise density is small; 
ROAD yy-1(a:,;) is small only when the center pixel is the 
signal and the noise density is small. With the increasing of 
the noise density, the differences between the signal and the 
noise reflected in ROAD,, (x;,;) and ROAD as—1(2;,;) narrow. 
This means that the discontinuity property of the impulse noise 
pixels becomes weaker with the increase of the noise density. It 
is unreasonable to use the discontinuity criterion to make hard 
decisions for detection. Therefore, we construct a BBA mz to 
describe the beliefs of the corresponding detection decisions 
according to the discontinuity criterion. 

As we have discussed above, only the noise pixel can have 
large ROAD,,(2;,;) and only the signal pixel can have small 
ROAD yy-1(2;,;). Thus, for a given center pixel, the larger 
value of ROAD,,(;,;) it has, the larger belief it should be 
assigned to being detected as the noise; the smaller value of 
ROAD yy-1(a;,;) it has, the larger belief should be assigned 
to being detected as the signal. For a center pixel with the 
intensity of x;,;, its mz is constructed as follow. We take n = 
(M —1)/2, which means that we focus on the first half small 
dif (2x;,;,-) when considering the belief of that a pixel should 
be detected as the noise. 

ROAD (—1) (#:,5) 
ROAD m—1(#i,j) 
= 1 = txt 
Here, Imax and Imi, denote the maximum and minimum 
intensities of the whole image, respectively. 


(12) 
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162 | 160 8 158 | 157 
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163 | 160 | 164 | 162 | 159 50 


165 | 249 | 159 | 156 


(a) 


Figure 7. Visual representation of mz. (a) A detection window. (b) The illustration of rg(2). (c) Graphical representation of belief assignments. 


Since Tg (@i,7) < Teas —_ Imins ROAD «u-y (24,3) < 
z 


(4) x (Imax — Imin) and ROADy-1(2i,3) < (M — 1) x 
(Iinax — Imin). That is 0 < m2(N) < 1 and 0 < ma(S) < 1. 
Besides, since 

mg(N) + m2($) 


i= (Ge ee 
= (M~1)*Umax—Imin) — QED x (Tax —Imin) 


Daag rer rg (i,j )-ROAD (ar—1)/2 (21,5) 
I= 
(M—1)X imax —Imin) : 


and pe aes Pg(@i,j) = ROAD wey (x;,;), there exists 
m2(N) + m(S) < 1. Thus, mz satisfies the constraint of 
BBA in (2) and mz is a legitimate BBA. 

For a given pixel, mass values in the BBA m2: m2(N), 
mz2(S) and m2(©) can be represented as the areas of regions 
as shown in Fig. 7. 

Fig. 7(a) illustrates an example of a detection window. 
The intensity of the center pixel x;,; = 2. We suppose that 
the largest intensity difference Imax — Imin in the image is 
255. Since the size of the window M is 25, we can get 24 
dif(x;,j;,-) values. The ascending ordered dif(«;,;,-), ie., 
Tg(®i,j), (g = 1,2,--- ,24), are expressed as the histogram 
in Fig. 7(b). We specify the area of the rectangular region 
in Fig. 7(b) with the vertex points: (0,0), (24,0), (0,255) 
and (24,255) as 1. It means that we represent the value of 
(M—1)x (Imax—Jmin) in (12) using a region with an area of 1. 
Thus, the value of ROAD yy—1(2:,;)/[(M2—1) x (Imax—Imin)] 
can be represented by the region determined by r,(z;,;), 
(g = 1,2,--- ,24) in Fig. 7(b). That is, the value of m2(S) 
in (12), ie., 1-ROADyy-1(2%i,;)/[(M — 1) x (imax — Imin)] 
can be represented as the blue area in Fig. 7(c). Similarly, 
the value of ROAD (n-1)/2(#i,5)/[(M — 1) x Imax — Imin)] 
can be represented by the region determined by r,(z;,;), 
(g =1,2,--- , 12) in Fig. 7(b). That is, the value of m2(N) in 
(12), ie., 2x ROAD (a4—1) /2(23,3)/[(M —1) x (Imax — Imin))], 
can be represented as the pink area! in Fig. 7(c). Thus, the 
value of m2(Q) is represented as the remanent area, i.e. the 
green area in Fig. 7(c). 


=] — 


'Since the value of m2(N) is the double of ROAD as—1)/2(#i,3)/[(M— 
1) x (Imax — Imin)], m2(N) can be represented as the double of the region 
determined by rg(x;,j), (g = 1,2,--+ , 12). 
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Fig. 8 shows the graphical representations of mz for differ- 
ent kinds of center pixels with different noise density levels 
(25%, 50% and 75%). Center pixels include the impulse noise 
pixel (the first column in Fig. 8), the signal pixels with extreme 
property, such as the edge pixel (the second column in Fig. 8), 
the signal pixels in the bright or dark area (the third column 
in Fig. 8), and the common signal pixels with no extreme 
property (the last column in Fig. 8). Here, we assume that the 
largest intensity differences Imax — Jmin for the whole image 
in all cases are 255. 

In Fig. 8, the signal pixels (from the second column to the 
last column) have large values of m2(S) indicating that for 
signal pixels, large beliefs are assigned to being detected as 
the signal under all noise density levels. 

For the impulse noise pixel, when the noise density < 50% 
(Fig. 8(a) or Fig. 8(e)), it has large value of m2(NV) indicating 
that large belief is assigned to being detected as the noise. 
However, when the noise density is larger than 50% (Fig. 8(i)), 
the impulse noise pixel has small value of m2(N) indicating 
that only small belief is assigned to being detected as the noise. 
But at the same time, its value of m2(O) is large indicating 
that the uncertainty degree of discontinuity criterion is large 
when the image is corrupted severely. 

Our evidential method uses the BBA to describe the beliefs 
of the corresponding detection decisions according to the 
discontinuity criterion. We do not make the hard decision 
directly but keep the uncertainty for the time being, which is 
more cautious. Particularly, our modeling method here keeps 
the large uncertainty of discontinuity criterion when the noise 
intensity is large. This will be helpful for the final fusion-based 
detection to decrease the miss-detections and false alarms. 

3) Fusion based detection The generated BBAs m, and m2 
can be combined, e.g., using Eq. (5) to obtain m(-) = [mi ® 
mg](-), which is a combined BBA for the noise detection 
representing the simultaneous use of the extreme property 
and discontinuity property. Once m is obtained, we use the 
pignistic probability transformation in Eq. (6) to transform m 
into a probability measure BetP. If BetP(.V) > 0.5, the center 
pixel should be detected as the noise. 

Here we use an example to illustrate our detection proce- 
dure. A detection window is shown in Fig. 9. 
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Figure 8. Illustration of mz in cases with different noise densities. (From top to bottom, from left to right) The first row to third row show the cases when 
the noise densities are 25%, 50% and 75% respectively. The first column to fourth column show the cases when the center pixels are the impulse noise, edge 
pixel, signal pixel in the bright or dark area and common signal pixel without extreme property respectively. 


202 | 203 | 203 | 201 | 206 
201 | 202 8 204 | 204 
203 | 200 0 202 
200 | 9 201 | 204 | 204 
200 | 201 | 200 | 204 | 202 


Figure 9. The illustration of a detection window. 


In this window, intensities of signal pixels range from 200 
to 206 and several pixels are corrupted by the impulse noise 
with intensities of 0, 8 or 9. Assume the intensity range of the 
whole image is [0,255]. The value of € in (10) is 0.1. 


According to the modeling method I, the generated BBAs 


are: 
m,(N) = 0.8416, ma(N) = 0.5696, 
m,(S') = 0.0933, and ma(S) = 0.3320, 
m,(O) = 0.0651, m2(@) = 0.0984. 
The combined BBA is: 
m(N) = 0.8978, 
m(S) = 0.0926, 
m(O) = 0.0096. 


Then, we obtain the pignistic probability BetP(.V) = 0.9026 
and the center pixel is finally detected as the impulse noise 
since BetP(NV) > 0.5. 


B-2 Evidential modeling method II and fusion based detec- 
tion 


The impulse noise detection with two criteria including the 
extreme property and discontinuity property can be viewed 
as a multi-criteria (to be more accurately bi-criteria) decision 
making problem. Therefore, we can use the cautious ordered 
weighted averaging with evidential reasoning (COWA-ER) 
method [34] to generate BBAs and to implement the fusion- 
based noise detection. 


712 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


1) COWA-ER method 

In the noise detection problem, for a given pixel, the finite 
set of alternatives O = {01,02} = {N,S}. According to the 
analyses in method I, the pessimistic and optimistic valuations 
of the expected payoffs to these alternatives obtained from 
the two detection criteria (extreme criterion and discontinuity 
criterion) are: 


ROAD M-1 (2; 3) 
=n d-—do z : 
€min(V) = min{ dext—do te? MEX (Tnax—Imin) 2’ 


a d ROAD M-1 (xi,5) 
— c— 40 2 
€max(V) = max{ dext—dote? BED X lax = baie) 


Dur_aleig) i 


= mi de—dine 
€min(S) = min{1 — = omer ore ee (1D) (Tan fos) 
= ROAD nw -1(23,; 
€max (5) = max{1 = pte ; 1- a P 
13) 
Then, the expected payoff matrix is constructed as: 
E[N] | | [Emin(V) €max(N)] 
E= = (14) 
| E[S] [Emin(S), €max(5)| 


Here, the expected payoffs E| NJ] and E[S'] are imprecise 
since they belong to the interval [emin(-),@max(-)] where 
the lower and upper bounds represent the pessimistic and 
optimistic attitudes, respectively. 

Then, divide each bound of intervals by the max of the 
bounds, i.e., Emax = Max{emax(N), emax(S')}, to get a new 
normalized imprecise expected payoff vector E!™?: 


[emin(V)/emax, €max(N)/emax]| _ [a1, bi] | 
€min(S')/emax, €max(S)/emax| a [a2, be] ‘ 

(15) 

In the final, convert the normalized imprecise expected 
payoff vector E™P into BBAs according to a very natural 
and simple transformation [34], [36]. The generation of a BBA 


associated to the hypothesis 6;, (0, = N, 02 = S) from any 
imprecise value [a; ,6;] C [0,1] is generated as: 


Ep = 


mi(9;i) = ai, 


mi(6;) =1— bi, 
m;(6; U 0;) = b; — Aj. 


(16) 


0; is the complement of 6; in ©. With such a conversion, one 
sees that Bel(0;) = a;, Pl(0;) = b; and the uncertainty is 
represented by the length of the interval [a; , bj]. 


2) Fusion based detection 

By using the COWA-ER method, we can obtain two BBAs: 
my, and mg. The generated BBAs can be combined using 
Eq. (5), that is m(-) = [m1 © m2](-). Once m is computed, 
we use the pignistic probability transformation in (6) to trans- 
form m into a probability measure BetP. If BetP(V) > 0.5, 
the center pixel should be detected as the impulse noise. 

Here we consider the same example showed in Fig. 9. The 
value of € in Eq. (13) is 0.1. The BBAs generated from 
modeling method II are: 


m,(N) = 0.6768, 


mi(S) = 0, 
m1(®) = 0.3232, 


m2(N) = 0.6055, 
m2(S) = 0.1109, 
m(@) = 0.2836. 


and 


The combined BBA is: 


m(N) = 0.8622, 
m(S) = 0.0388, 
m(®) = 0.0990. 


Then, we get the pignistic probability BetP(N) = 0.9117 and 
the center pixel is finally detected as the impulse noise because 
BetP(N) > 0.5. 

When modeling the uncertainty of noise detection, the two 
proposed methods use the same information (extreme criterion 
and discontinuity criterion) but generate belief functions in dif- 
ferent ways. Either of these two methods can be an alternative 
to the other in many cases but they might generate different 
detection results in some special situations. 


B-3 Different detection results with contradictory evidences 

In many cases, the two proposed methods generate same 
detection results since they describe the extreme criterion and 
discontinuity criterion in very similar ways and both of their 
combined evidences will assign a larger belief to the same 
candidate (noise or signal). However, when these two evidence 
sources are highly contradictory (extreme criterion and discon- 
tinuity criterion give very different supports to the target pixel), 
the two proposed methods might generate different detection 
results as illustrated in the following two different examples 
that the evidence sources are highly contradictory. 


1) Detection results for situation 1 

Situation 1 describes the situation when the target pixel is 
a signal in a dark area close to an edge where the pixels at 
the other side of the edge have higher intensities as shown in 
Fig. 10. 


50 41 28 18 LS 


26 34 44 45 41 


Figure 10. Highly contradictory situation 1. 


According to the extreme criterion and discontinuity crite- 
rion, the two BBAs generated by evidential modeling method 
I using Eq. (10) and Eq. (12) are: 


m,(N) = 0.9548, 
m,(S) = 0.0235, m2(S) = 0.9440, 
m,(0) = 0.0217, m2(@) = 0.0521. 

In this situation, the proposition that the target pixel should 


be detected as noise obtained very different supports from the 
extreme criterion (7 (JV) is large) and discontinuity criterion 


ma(N) = 0.0039, 
and 
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(™m2(N) is small) since the target pixel’s intensity is very close 
to the extreme value 0, but at the same time, the target pixel 
has many neighborhoods have the similar intensities. After 
the evidence combination and probability transformation, we 
finally get BetP(N) = 0.5491 and the target pixel is false 
alarmed as noise since BetP(N) > 0.5. 

The evidential modeling method II deals with these two 
highly contradictory evidences in different ways. According to 
Eq. (13) and Eq. (14), the expected payoff matrix is generated 


"a [B]- 


Then, we get the normalized expected payoff vector: 


E(N] | _ | (0.0041, 1.0000] | 
E[S] | | [0.0246, 0.9887]. 


The generated BBAs are: 


mi(N) = 0.0041, 
mi(S) = 0, 
m4(®) = 0.9959, 


After the evidence combination and probability transforma- 
tion, we finally get BetP(N) = 0.4954 and the target pixel 
is successfully detected as signal since BetP(N) < 0.5. For 
this example, the detection result generated by the proposed 
method II is more reasonable. 


(0.0039, 0.9548} 
(0.0235, 0.9440] | ° 


EImp = | 


m2(N) = 0.0113, 
m2(S) = 0.0246, 
m2(@) = 0.9641. 


and 


2) Detection results for situation 2 

Situation 2 describes the highly corrupted situation when 
the target pixel is noise and the neighborhood signal pixels 
(colored with green) have similar intensities with the target 
pixel as shown in Fig. 11. 


249 | 223 | 252 7 7 
220 | 219 | 248 | 249 | 253 
6 252 | 246 0 9 
254 | 246 2 209 | 245 
219 | 219 | 251 | 245 | 247 


Figure 11. Highly contradictory situation 2. 


According to Eq. (10) and Eq. (12), the two BBAs generated 
by evidential modeling method I are: 


mi(N) = 0.7914, 
m1(S) = 0.1049, 
m1(@) = 0.1037, 


ma(N) = 0.0141, 
m2(S) = 0.7296, 
m2(@) = 0.2564. 


and 


In this situation, the proposition that the target pixel should 
be detected as noise obtained very different supports from the 
extreme criterion (™m(JV) is large) and discontinuity criterion 


(m2(N) is small) since the target pixel’s intensity is close to 
the extreme value 255, but at the same time, there are many 
neighborhoods have the similar intensities with the target pixel. 
After the evidence combination and probability transforma- 
tion, we finally get BetP(N) = 0.5432 and the target pixel 
is successfully detected as noise since BetP(N) > 0.5. 

The evidential modeling method I deals with these two 
highly contradictory evidences in different ways. According 
to Eq. 13 and Eq. 14, the expected payoff matrix is generated 


as: sin 

°-[ 8] =| 

Then, we get the normalized expected payoff vector: 
E(N] | 7 | (0.0178, 1.0000] | 

E[S] | | [0.1325,0.9219] |” 

The generated BBAs are: 
m,(N) = 0.0178, 


m1(S) = 0, 
m1(®) = 0.9822, 


(0.0141, 0.7914] 
(0.1049, 0.7296] | ° 


Emp == | 


m(N) = 0.0781, 
mo(8) = 0.1325, 
m(@) = 0.7894. 


and 


After the evidence combination and probability transfor- 
mation, we finally get BetP(N) = 0.4809 and the target 
pixel is miss-detected as signal since BetP(N) < 0.5. For 
this example, the detection result generated by the proposed 
method I is more reasonable. 

From the above, the two proposed methods are likely to 
generate different detection results in highly contradictory 
situations. 


IV. ADAPTIVE MEDIAN FILTERING 


After the noise detection, we focus on the filter imple- 
mentation. It should be better that only the corrupted pixels 
will undergo the filtering. The size of the filtering window 
influences the filtering performance a lot, and the optimal 
window size is usually determined by the detection result. 
Therefore, in this paper we further propose an adaptive switch 
median filtering method, which adaptively determines the size 
of filtering window according to the detection result. 

For a given pixel at (7,7) , the filtering window with a size 
of wr X wr centered at it is: 


(we 1). < (ura) 


Wr(i, J) _ foe f= — 2 8st > 2 


i. 
(17) 
where x;_5,;~1 is the intensity of the pixel at (i — s,7 —t). 

Generally, in order to preserve details better, the size of 
filtering window should be as small as possible if there are 
enough signal pixels in the filtering window to help determine 
the filtered value. In the current filtering window Wpr(i, 7), 
the proportion of the detected signal pixels is: 


Wr 
num 


W, 
BWP au 
WRF X WE 


pro 


(18) 


where S!Vr is the number of the detected signal pixels in 
Wr(i,j). If the proportion of signal pixels in the current 


filtering window, i.e., oe is small, the size of the filtering 
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Table I 
ESTIMATION RESULTS OF NOISE DENSITY (%). 


29.99_| 40.00 


29.98 | 39.99 | 49.99 80.00_[ 90.00 


window should be expanded to see if the proportion is large 
enough in a larger filtering window. 

When the noise density is large, he is likely to be 
small. Thus, the minimum ove required for not extending 
the filtering window size should be reduced with the increase 
of the noise density to avoid over smoothing. Therefore, the 
noise density should be estimated first. 


A. Noise density estimation 


The noise density is estimated according to the noise 
detection result: 


(19) 


Here, Nnum is the total number of the detected noise pixels 
and Prum is the total number of the pixels in the image. 

The performance of noise density estimation for corrupted 
Lena images are presented in Table I, where method I and 
method II represent the two proposed evidential modeling 
methods respectively. In this experiment, noise pixels take 
values in the sets of S; = {0,1,...,10} and Sy = 
{245, 246,..., 255}, ie., a = 10. The values of ¢ in Eq. (10) 
and Eq. (13) are 0.1. The size of Wp is 11 x 11 based on 
a great deal of tests. According to Table I, the estimation 
results are very close to the actual noise densities indicating 
that our detection methods are effective, and they can be used 
to determine the size of filtering window. 


B. Filtering method 


According to the estimated noise density dy and the propor- 
tion of the detected signal pixels oF , the condition of judging 
whether the current filtering window should be expanded or 
not, is set as: : 

sv > (1—dy) * B. 


pro 


(20) 


Here, /3 is a scale factor taking value in the range of (0,1). We 
set it as 1/4 based on a great deal of tests on various images. 
When the noise density is small, the minimum required ove 
for not expanding the current filtering window is large; when 
the noise density is large, the minimum required S'VF is small. 


pro 
Our filtering method can be outlined below: 
e Step 1: Set the initial size of filtering window wr x wr to 


3x 3 and set the maximum window size to w* x wi. 

e Step 2: Set a filtering window Wr(i, 7) centered at the 
target pixel at (i, 7) with current size of wr xX wr. 

e Step 3: If the proportion of the detected signal pixels in 
the filtering window, i.e., oe satisfies the criterion in 
Eq. (20), go to Step 5). 

e Step 4: Extend the filtering window size to (wr + 1) x 
(we +1) and repeat Steps 2) - 3) until the current filtering 


max max 


window size reaches wi** x wy 


e Step 5: Apply a median filtering to the cur- 
rent filtering window. The output intensity Y;; = 
median {z;~s,;-1|2i-s,j-1 € WP(S)}, where WP(S) 
is the set of all detected signal pixels in the current 
filtering window. 


The maximum window size is empirically given in Table 
II based on a large quantities of tests on various images. In 
Table II, different window sizes are suggested for different 
noise density levels. 


Table II 
RECOMMENDED MAXIMUM SIZE OF FILTERING WINDOW. 


Estimated noise density | wr X wr 
0% < dy < 30% 
30% < dn < 50% 


50% < dw < 70% 
dy > 10% 


V. EXPERIMENTS 


The adaptive switching median filtering method we pro- 
posed includes two components: the impulse noise detection 
and the adaptive filtering process. Since the noise detection 
plays a key role in the final denoising performance, we first 
evaluate the performance of the noise detection. Then, we 
evaluate the filtering performance of our proposed adaptively 
median filtering and the whole denoising performance of our 
proposed ASMF-DBER method, respectively. Furthermore, 
the computational cost and sensitivity of the parameters’ 
setting of ASMF-DBER will be discussed. We will also check 
the adaptability of our ASMF-DBER for the value of a in the 
impulse noise model, which in fact controls the intensity range 
that the noise pixels take values in. 

Experiments are carried out using several monochrome 
images (Fig. 12). The experiment results of several existing 
methods, i.e., BDND [18], IBDND [19], ACWM [20], ASWM 
[23], ROR-NLM [25] and WCSR [24] are also provided for 
comparison. 


A. Performance evaluation of noise detection 


For the two proposed noise detection methods (method I 
and method II) based on two different evidential modeling 
methods respectively, we evaluate their performances using 
corrupted Lena image and the results are shown in TABLE III. 
The performances of ACWM, BDND, ASWM and ROR-NLM 
methods are also provided for comparison. The performance 
evaluation indices used here include the false alarm rate (FAR), 
miss-detection rate (MDR) and accuracy rate (AR): 


(21) 
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(d) 


Figure 12. Monochrome images for experiments. (a) Lena. (b) Barbara. (c) Baboon. (d) Boat. (e) Cameraman. 


Table III 
COMPARISON OF THE NOISE DETECTION PERFORMANCES FOR CORRUPTED LENA IMAGES (%). 


ACWM | BDND | ASWM | ROR-NLM | Method 1 | Method Il 


FAR 
MDR 
AR 


0.291 
0.302 
99.708 
0.608 
0.967 
99.320 
1.263 
2.995 
98.217 
2.923 
6.975 
95.456 
6.191 
12.864 
90.473 
12.063 
20.921 
82.622 
21.176 
30.789 
72.095 
33.869 
41.782 
59.801 
50.239 
52.852 
47.409 


0.003 
0.023 
99.995 
0.002 
0.264 
99.946 
0.004 
1123 
99.660 
0.003 
2.941 
98.822 
0.010 
6.057 
96.967 
0.014 
10.387 
93.762 
0.047 
15.587 
89.075 
0.137 
21.976 
82.392 
0.852 
29.431 
73.427 


10% 


MDyum 
MDR = — a (22) 
Pays F Anum M Drum 
AR = 2 (23) 


Here, F’ Anum is the number of the actual signal pixels being 
detected as the noise, M/Dyum is the number of the actual 
noise pixels being detected as the signal, SA,,, is the number 
of the actual signal pixels, N/A, is the number of the actual 
noise pixels, and Pym is the number of pixels in the image. 

In this experiment, a = 10, i.e., noise pixels take values 
in S; = {0,1,...,10} and Sy = {245, 246,...,255}. Values 
of ¢ in Eq. (10) and Eq. (13) are 0.1. The size of Wp is 
empirically determined as 11 x 11 based on a great deal of 
tests. 

As shown in Table II, when the noise density is no larger 
than 60%, the accuracy rates of these methods are all larger 
than 80%. When the noise density is larger than 60%, the 
accuracy rates of ACWM, BDND, ASWM and ROR-NLM 
methods drop rapidly. However, our proposed methods still 
achieve high accuracy rates (> 90%). 


0.654 
0.317 
99.380 
0.927 
0.485 
99.161 
1,372 
0.855 
98.783 
2.370 
1.924 
97.808 
13.893 
3.492 
91.308 
13.892 
11.785 
87.372 
29.166 
22.981 
75.164 
52,535 
36.742 
60.099 
76.716 
38.678 
57.518 


1.020 
0.019 
99.080 
1.176 
0.086 
99.042 
1.341 
0.157 
99.014 
1.433 
0.414 
98.975 
5.663 
1.634 
96.351 
5.665 
6.513 
93.826 
18.784 


0.004 
0.237 
99.973 
0.004 
0.116 
99.974 
0.003 
0.046 
99.984 
0.006 
0.004 
99.995 
0.014 
0.003 
99.992 


0.012 
0 
99.995 
0.037 
18.742 0 
81.245 99.989 


43.370 0.064 
35.514 0 
62.915 99.987 
74.130 0.114 
48.875 0.001 
48.600 99.988 


0.003 
0.191 
99.978 
0.002 
0.122 
99.974 
0.002 
0.107 
99.967 
0.002 
0.042 
99.982 
0.002 
0.027 
99.986 
0.002 
0.005 
99.996 
0.004 
0.002 
99.997 
0.006 
0.001 
99.998 
0.004 
0.001 
99.999 


B. Performance of filtering 


To evaluate the filtering performance, we compare the 
filtering performance of the proposed adaptive median filtering 
method with the standard median filtering (SMF) used in 
ACWM, ASWM and the filters used in BDND, IBDND 
(adaptive weighted median filter), ROR-NLM and WCSR, 
respectively. In this experiment, a = 10 and all the filters are 
used on the detected noise pixels generated by the proposed 
noise detection method I. The experimental result is shown in 


Fig. 13, where Fror-nim:; FBpNpD; FiBpnp and Fwosr 
denote the filters used in ROR-NLM, BDND, IBDND and 


WCSR algorithms respectively. 

According to Fig. 13, when the noise density is no larger 
than 30%, the proposed filter has similar performance with the 
filters used in IBDND and WCSR. With the increase of the 
noise density, the proposed filter generates better performance 
than other filters. 


C. Performance of denoising 


To verify the whole denoising performance of our proposed 
ASMF-DBER, we compare the denoising performance of our 
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Figure 13. Comparison of filtering performances using PSNR for corrupted 
Lena images. 


proposed ASMF-DBER with ACWM, ASWM, ROR-NLM, 
BDND, IBDND and WCSR using PSNR and SSIM as shown 
in Fig. 14 and Fig. 15, respectively. In this experiment, a = 10 
and the size of detection window Wp is 11 x 11. ASMF- 
DBER I and II represent the denoising results based on the 
two proposed detection methods, respectively. 

When the noise density is low (< 20%), BDND, IBDND, 
and the proposed methods have similar denoising perfor- 
mances since they all can obtain high detection accuracy rates 
in low corrupted situations (as illustrated in Table III for Lena 
image) and have similar denoising performances when the 
noise detection results are accurate enough (as shown in Fig. 
13 when only the actual noise pixels are filtered). 

BDND and IBDND have better performance for Lena, 
Baboon and Boat images when the noise density is 10%. These 
images have no intensities close to extreme values (0 and 255). 
BDND and IBDND method can obtain better performance 
easily since they only uses extreme criterion when detecting 
impulse noise. 

WCSR method has very good performance on Barbara 
image when the noise density is 10%. The reason is that 
Barbara image has big areas with regular texture. When the 
noise density is low, since the noise detection result is accurate 
enough, WCSR can reconstruct the texture very well using the 
trained dictionaries. 

With the increase in noise density, the PSNR of BDND, 
IBDND and WCSR are much lower than that they achieved in 
Fig. 13 when only the actual noise pixels, but not the detected 
noise pixels, are filtered. That means, when carrying out the 
filtering on the detected impulse noise pixels, the detection 
result affects the whole denoising performance significantly. 
BDND, IBDND and WCSR fail to achieve satisfied filtering 
performances because of their poor detection results. 

The subjective quality comparisons of filtered images are 
illustrated from Fig. 16 to Fig. 19. The false alarmed pixels and 
miss-detected pixels of the two proposed methods are colored 
with red and green, respectively. Except for Cameraman im- 
age, other test images do not have many false alarms and miss- 


detections. In order to highlight the colored pixels in these 
images, we circled the colored pixels using the corresponding 
colors (red for false alarms and green for miss-detections). 

From the comparisons of quantitative results and visually 
subjective qualities, we can see that the proposed ASMF- 
DBER algorithms obtain superior denoising results compared 
with other switch median filters and the sparse representation 
based method. Particularly, in the high noise density cases, 
ASMF-DBER has obvious advantages over others. 

For Cameraman image, the pixels around the edge of the 
“cameraman” would obtain highly contradictory evidence sup- 
ports from the extreme criterion and discontinuity criterion, as 
the highly contradictory situation | (Fig. 10), and the proposed 
two detection methods are likely to obtain different detection 
results. In Fig. 17, ASMF-DBER II has more false alarms 
than ASMF-DBER I at these pixels, so that the denoising 
performance of ASMF-DBER I for Cameraman image is not 
so good as ASMF-DBER II, as shown in Fig. 14(e) and Fig. 
?2(e). 

Among these algorithms, WCSR has the most parameters (8 
parameters) to be determined and some of them are sensitive 
with the noise density, what is a challenge for WCSR to obtain 
a satisfied denoising result. 

From the above colored incorrect detections of the proposed 
two methods and Table III, we can find that ASMF-DBER I 
generates more false alarms than ASMF-DBER II and ASMF- 
DBER II generates more miss-detections than ASMF-DBER I. 
Therefore, in practical applications, if the user relatively more 
emphasizes low miss-detection rate, we suggest ASMF-DBER 
I; if the user relatively more emphasizes low false-alarm rate, 
we suggest ASMF-DBER II. 


D. Sensitivity of parameters’ setting 


There are two parameters to determine in our method. One 
is the detection window size and the other one is 2 in Eq. 20 
used for deciding whether the current filtering window should 
be expanded or not. To discuss the sensitivity of the setting of 
these two parameters, we compare the denoising performances 
of all the combinations of the two parameters. The comparison 
results for the two proposed denoising methods are shown 
in Table IV and Table V, respectively. In this experiment, ( 
changes with an incremental step 1/8 from 1/8 to 7/8. The 
detection window size is set as 5 x 5, 7x 7,99, 11x 11 
or 13 x 13. 

From Table IV and Table V, the filtering performance is not 
very sensitive to the setting of 3. When the noise density is 
10% or 20%, all the 8 generate the same performance since 
the limited maximum filtering window is 3 x 3 when the 
estimated noise density is no larger than 30% according to 
Table II. With the increase of the noise density, the denoising 
performance becomes poorer when selecting small size of 
detection window. When the noise density is higher than 70%, 
large detection windows (larger than 7 x 7) achieve obvious 
better denoising performance than small detection windows 
(no larger than 7 x 7). When the size of detection window is 
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Figure 14. Comparisons of denoising performances using PSNR. (a) Lena. (b) Barbara. (c) Baboon. (d) Boat. (¢) Cameraman. 
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Figure 15. Comparisons of denoising performances using SSIM. (a) Lena. (b) Barbara. (c) Baboon. (d) Boat. (e) Cameraman. 
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Table IV 
DENOISING PERFORMANCES OF ASMF-DBER I FOR DIFFERENT DETECTION WINDOW SIZE AND B. 
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(b) 


(f) (g) 


Figure 16. Denoising results for Lena image (noise density is 30%). (a) Corrupted image. (b) ASWM. (c) ROR-NLM. (d) BDND. (e) IBDND. (f) WCSR. 
(g) ASMF-DBER I. (h) ASMF-DBER II. (i) Colored detection results of ASMF-DBER I. (j) Colored detection results of ASMF-DBER II. 


Figure 17. Denoising results for Cameraman image (noise density is 40%). (a) Corrupted image. (b) ASWM. (c) ROR-NLM. (d) BDND. (e) IBDND. (f) 
WCSR. (g) ASMF-DBER I. (h) ASMF-DBER II. (i) Colored detection results of ASMF-DBER I. (j) Colored detection results of ASMF-DBER IL. 


set as 11 x 11, and ( is set as 1/4, we can usually obtain the 
best denoising performance. 


E. Computational cost 


The computational cost is an important index to evaluate 
an algorithm. We timed the computational costs of ACWM, 
ASWM, ROR-NML, BDND, IBDND, WCSR and the pro- 
posed methods by running the algorithms on a Windows 7 
Enterprise system equipped with Intel Core 17-4790 CPU at 
3.60 GHz and 8.00 GB DDR-III memory. The comparison 


of their average execution time for corrupted Lena images 
with size of 512 x 512 are shown in Table VI. Each average 
execution time is calculated from 10 runs of experiments. 
According to Table VI, the computational cost of the proposed 
methods varies from 80 to 130 seconds with the increase of 
noise density. The sparse representation based method WCSR 
is most time consuming (more than 3000 seconds) and the 
proposed methods need more computational cost compared 
with ACWM, ASWM, BDND and IBDND algorithms. 
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(h) (i) 


Figure 18. Denoising results for Barbara image (noise density is 70%). (a) Corrupted image. (b) ASWM. (c) ROR-NLM. (d) BDND. (e) IBDND. (f) WCSR. 
(g) ASMF-DBER I. (h) ASMF-DBER II. (i) Colored detection results of ASMF-DBER I. (j) Colored detection results of ASMF-DBER II. 


ES etre Baars 


a) 


eh 


(g) (h) (i) (j) 


Figure 19. Denoising results for Baboon image (noise density is 90%). (a) Corrupted image. (b) ASWM. (c) ROR-NLM. (d) BDND. (e) IBDND. (f) WCSR. 
(g) ASMF-DBER I. (h) ASMF-DBER IL. (i) Colored detection results of ASMF-DBER I. (j) Colored detection results of ASMF-DBER II. 
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Table VI 
AVERAGE EXECUTION TIME OF EIGHT ALGORITHMS FOR CORRUPTED LENA IMAGE WITH DIFFERENT NOISE DENSITIES (UNIT SECOND). 
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Figure 20. Comparisons of denoising performance using PSNR for Lena images corrupted by the impulse noise with various values of a. (a) a = 0. (b) 


a= 5. (c) a= 15. (d) a = 20. 


To some extent, the better denoising performance of the 
proposed methods is at the cost of more computational cost. 


F. Adaptability for different impulse noise models 


In order to further check the adaptability of ASMF-DBER 
for the different values of @ in noise model, i.e., the different 
intensity ranges for impulse noise, we use PSNR for Lena 
images corrupted by the impulse noise with other values of a, 
the quantitative results are shown in Fig. 20. Furthermore, we 
also compare the results obtained using a recent alternative 
fusion rule PCR6 [37] (Proportional Conflict Redistribution 
rule No. 6) when combining the generated BBAs m, and mg 
in our ASMF-DBER I and ASMF-DBER II methods. These 
two results are denoted by ASMF-DBER I (PCR6) and ASMF- 
DBER II (PCR6) respectively. 

As shown in Fig. 20, the PSNR of ASMF-DBER results 
are relatively high when a varies between 0 and 15. Although 
they drop slightly when a = 20, they are still higher than other 
methods in general. 


VI. CONCLUSION 


To deal with the problem of the impulse noise reduc- 
tion, first, we propose two impulse noise detection methods 
based on evidential reasoning before filtering. Second, we 
design an adaptive switching median filtering method, which 
adaptively determines the size of filtering window according 
to detection results. The subjective and objective analyses 
from our experimental results verify that our new proposed 
detections approaches and related filtering algorithms have 
superior performance compared with existing algorithms. 

The generation of BBAs is crucial in evidential reasoning, 
however there is no general theoretical method for BBA 


generation. In this paper, we use two types of BBA generation 
methods in evidential modeling for the uncertainties encoun- 
tered in the impulse noise detection and have evaluated their 
performances. In future work, we will focus on other BBA 
generation methods, which can better depict the uncertainty 
encountered in the impulse noise detection. Other evidence 
combination rules will also be used to make comparisons. We 
will also do more theoretical analyses on the determination of 
parameters used in our algorithm. Furthermore, we will apply 
our impulse noise detection method to sparse representation 
based filtering approach to deal with more complicated noise 
models, such as the impulse/Gaussian mixed noise. 
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Abstract—Recently, a measure of total uncertainty (TU) in 
Dempster-Shafer Theory (DST), based on the pignistic distri- 


bution called Ambiguity Measure (AM), have been modified. 
The resulting new measure has been simply referred as Modified 
Ambiguity Measure (MAM). In the literature, it has been shown 
that AM, in addition to showing some undesirable behaviors, has 
important drawbacks related to two essential properties for such 
measures: subadditivity and monotonicity. The MAM measure 
has been developed to solve the AM subadditivity problem, but 
this paper demonstrates that MAM suffers the same drawback 
as AM with respect to monotonicity. A measure of uncertainty 
that cannot meet the monotonicity requirement has a major 
drawback for its exploitation in operational contexts such as in 
analytics, information fusion and decision support. This paper 
aims at identifying and discussing drawbacks of this type of 
measures (AM, MAM). Our main motivation is to insist upon 
the important requirement of monotonicity that a TU measure 
should possess to improve its potential of being used and trusted 
in applications. This discussion is due time since the monotonicity 
problem needs first to be solved to avoid building too high 
expectations for usefulness and potential exploitation of such 
measures in operational communities. 

Keywords: Imprecise probabilities, theory of evidence, mea- 
sures of uncertainty, conflict, non-specificity, pignistic proba- 


bility. 
I. INTRODUCTION 


Dempster-Shafer’s theory (DST) extends the classical prob- 
ability theory (PT). In DST, more types of uncertainty can 
be represented than in PT. These types of uncertainty found 
in DST are called conflict, randomness or discord; and non- 
specificity respectively (see Yager [25]). Klir and Wierman 
[18] present a total uncertainty (TU) measure in DST that has 
been justified by an axiomatic approach considering TU in 
probability theory as a reference. They also attach to that TU 
definition, a set of five desired properties that TU must verify. 
Abellan and Masegosa [7] extend that set to add the important 
property of monotonicity as well as other behavioral properties 
related to TU. 

In DST, upper (or maximum) of entropy is the only function 
that verifies all the basic required properties listed later in 
Section III: P1-P5. Jousselme et al. [15] presented a new TU 
measure in DST, called AM, based on the pignistic distribu- 
tion. The authors, in 2006, proved that AM verifies the needed 
properties (P1-P5) and that AM sorts out other shortcomings 
of upper entropy. However, Klir and Lewis [19] found that AM 
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function does not, in fact, verify requirement P4: subadditivity. 
Abellan and Masegosa [7] presented an extension of the set of 
the required properties (P1-P5) that a TU measure in DST must 
verify. They extended the set (P1-P5) from Klir and Wierman 
[18] and added some desirable behaviors that TU should have. 
In 2008, Abellan and Masegosa [7] showed that AM does not 
verify the important property of monotonicity (P6) in addition 
to present some undesirable behaviors. Recently, Shahpari and 
Seyedin [22] have presented a modified function of AM, called 
MAM. They claim that MAM verifies the required properties 
(P1-P5) as well as behaves correctly in applications. That 
claim motivates our discussion here about the drawbacks of 
such measures. 

This paper aims at identifying drawbacks of such measures 
of uncertainty (AM, MAM) based on the pignistic trans- 
formation of a basic probability assignment in DST. Our 
main motivation is to insist upon the important requirement 
of monotonicity (P6) that a TU measure should possess to 
improve its potential to be used and trusted in applications 
such as in analytics, information fusion and decision support. 
Defined as they are in Jousselme et al. [15] and Shahpari and 
Seyedin [22], AM and MAM will produce incorrect results 
and undesirable behaviors if used in applications. 

The paper is organized as follows. Section II reviews briefly 
the representation of information and uncertainty within the 
framework of the Dempster-Shafer Theory (DST). Section III 
discusses the drawbacks of AM and MAM with respect to the 
required properties of a total uncertainty (TU) in DST. Section 
IV presents a discussion on desirable behavioral requirements 
of a TU measure in DST. We conclude in Section V. 


II. INFORMATION REPRESENTATION IN THEORY OF 
EVIDENCE 


A. Dempster-Shafer theory of Evidence 


Let X be a finite set considered as a set of possible 
situations, |X| = n,g(X) the power set of X and x any 
element in X. Dempster-Shafer theory of evidence (Dempster 
[9], Shafer [23]) is based on a function called basic probability 
assignment (b.p.a.), that is a mapping m : e(X) — [0,1], 
such that m(9) =0 and > m(A) =1.A set A such that 

ACX 


m(A) > 0 is called a focal element of m. 
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Let X,Y be finite sets. Let X x Y be the product space of 
those sets and m a b.p.a. on X x Y. We use m!* to note to 
the marginal b.p.a. on X, (and similarly on Y, m**), and it 
is defined as follows: 


m(R), VAC X (1) 
R|A=RLX 


where R‘* is the set projection of R on X. 

Associated with each basic probability assignment, there 
exists two functions: a belief function, Bel, and a plau- 
sibility function, Pl: Bel(A) = >> m(B), PIA) = 

BCA 


S> m(B). They can be considered as the lower and upper 
ANB#A0O 
probability of A, respectively. 


We may note that belief and plausibility are interrelated for 
all A € o(X), by Pl(A) = 1 — Bel(A°), where A‘ denotes 
the complement of A. Furthermore, Bel(A) < PI(A). 

On each b.p.a. on a finite set X, there exists a set of 
associated probability distributions p on X, of the following 
way: 


Km = {p| Bel(A) < p(A), VAE @(X)} 2) 


We remark that Bel(A) < p(A) is, in this case, equivalent 
to Bel(A) < p(A) < PI(A). Ky, is a closed and convex 
set of probability distributions, also called a credal set in the 
literature. 


B. Measures of uncertainty in DST 


The entropy function (Shannon [24]) on probability theory 
is defined by the following continuous function: 


S(p) = — S5 p(x) logs (p(2)), (3) 


cTEX 


where p = (p(x))zex is a probability distribution on X, p(x) 
is the probability of value x and log, is used to quantify 
the value in bits, but in the literature is used log and log, 
indifferently. The value S(p) (also used as S(p(x1), p(X2),..-)) 
quantifies the only type of uncertainty presented on probability 
theory. This measure in PT verifies a large set of properties 
(see Shannon [24], Klir and Wierman [18]). 

In DST, Yager [25] found different two types of uncertainty: 
the first one appears when a b.p.a. has positive masses on sets 
with empty intersections; and the other one appears when the 
b.p.a.has positive masses on sets with cardinality greater than 
one. Those types of uncertainty are normally called conflict 
and non-specificity, respectively. 

Dubois and Prade [10], introduced in DST a function based 
on the classical Hartley measure (Hartley [11]) on classical 
set theory. That measure, noted as J, represents a measure 
of non-specificity for a b.p.a. It is expressed in the following 
way: 

I(m) = }° m(A) log(|Al). (4) 


ACX 


This measure I(m) has its minimum value (zero) for a 
b.p.a. m that is a probability distribution. Its maximum value, 


log(|X|), is obtained for a b.p.a., m, where m(X) = 1 and 
m(A) = 0, VA Cc X. We can see in the literature that I 
verifies a large set of needed properties for such a type of 
measure. 

In the 90’s, some measures were introduced with the aim to 
quantify the conflict degree that a b.p.a. expresses (see Klir and 
Wierman [18]). Yager [25] introduced the following function: 


E(m) = — S~ m(A) log PI(A). (5) 
ACX 


But this function does not verify in DST all the required 
properties. 

The measure: S*(m) equal to the maximum/upper of the 
entropy (upper entropy) on the set of probability distributions 
verifying Bel(A) < >> p(x) < PI(A), VA CX, was pro- 

A 


posed by Harmanec and Klir [12], [13]. This set of probability 
distributions is the credal set associated with a b.p.a. m, that 
we have noted as K,, in Eq. (2). 

S* is considered as a total uncertainty measure in DST: 
a measure that quantifies both types of uncertainty: conflict 
and non-specificity. But in Harmanec and Klir [13] was 
not separated in those corresponding parts. It can be seen 
in Abellén, Klir and Moral [6], that this measure can be 
separating coherently in conflict and non-specificity parts. In 
DST those parts are similar to the ones for general credal sets. 
It can be consider 


S*(m) = S,(m) + (S* — S,)(m), (6) 


with S*(m) the maximum entropy and S,.(m) the minimum 
entropy on the credal set K,,, associated to a b.p.a. m. S.(m) 
coherently quantifying the conflict part and (S* — S,)(m) its 
non-specificity part. That measure has been successfully used 
in applications (see Abellan and Moral [3]) 

To quantify conflict and non-specificity (ambiguity) in DST, 
Jousselme et al. [15] presented a measure based on the 
pignistic distribution on DST. Let m be a b.p.a. on a finite 
set X, then the pignistic probability distribution BetP,,, is 
defined on all the subsets A in X as follows: 


IAN BI 
|B| 


BetPm(A) = S~> m(B) 

BCX 
For a singleton set A = {x}, we have BetP,,({x}) = 
ze plm(B)/|B|]. Hence, as the authors says [15], the Am- 


biguity Measure (AM) for a b.p.a. m on a finite set X can be 
defined as: 


(7) 


AM(m) = S© BetPm(x) log(BetPm(x)), (8) 
TEX 


i.e., the entropy of the BetP,, probability distribution. 
Recently, Shahpari and Seyedin [22] have presented a mod- 
ified function of AM, based on the AM drawbacks identified 
in Klir and Lewis [19]. This function is called MAM and 
unfortunately presents some mathematical shortcomings in its 
definition as explained below. Shahpari and Seyedin exposed 
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that MAM coincides with AM on 1-dimensional space but 
in the case of 2-dimensional space, they used a different 
definition for the pignistic distribution without providing the 
essential justification. Here are the details. Let X,Y be finite 
sets, and mab.p.a.on X x Y.On X x Y, MAM coincides with 
AM function, via the probability distribution MM Bet, ,, = 
Betm,y (MAM is the entropy of that probability distribution). 
In the case of 2-dimensional space, Shahpari and Seyedin use 
the following function on each marginal b.p.a.: 


M Bet», (xi) = 


m A)t(x;EA 
Do Bep(X), Bax; Ae p(XxY),B=ALX macy (AS) Vari ex 


(9) 
with {{(a; € A) the number of appearances of x; in the set 
A, and |A| the cardinal of A. Similarly they define the values 
M Betmy (yi), Vyui € Y- 

To simplify, this expression can be reduced to the following 
one: 


MBetm (xi) = 


(10) 
DISA eeAe Ss, 


mxy (A)i(@i€A) 
ape EX 


with all the rest. Hence, 


MBetmy(B) = S > MBetmy (xi) =} m(A;) +, 


x,E€B 
with a = S0, a; > 0. 
Finally 
Belyix(B) = S m(A;) < MBetm, (B) 


With the Property 1 we show that M Bet,,,. is compatible 
with the correct definition of a marginal b.p.a. of m on X. 
But, as we will show in Section III, it would require some 
modification in the definition of the MAM measure, for being 
mathematically correct. 


III. DRAWBACKS OF MEASURES BASED ON PIGNISTIC 
TRANSFORMATION IN DST 


In Klir and Wierman [18] are exposed requirements for 
uncertainty measures in DST that quantify both types of un- 
certainty, i.e. total uncertainty measures. These requirements, 
in form of properties are the following ones: 


MAM function is then the entropy of those probability (P1) Probabilistic consistency: A total uncertainty measure 
Soi tenes 3 must be equal to the Shannon entropy in the case that the 
distributions on X and Y respectively. Authors supposed that focal al ioaka b iLeaeiee : 
M Bet,,,, is a probability distribution on X that belongs to the aa aces aca oe aa a 
credal set associated with the marginal of m on X (K,,1x). = 
That essential assertion has not been verified or proved by the TU(m) = ys m(z) log m(z). a 
: : cEX 
authors. The following property below is our way to prove 
that assertion. (P2) Set consistency: When exist a set A such that m(A) = 
Proposition 1: With the above notation, the probability 1 then TU must collapse to the Hartley measure: 
distribution expressed by Eq. (9) (or Eq. (10)) belongs to the 
associated credal set of the marginal b.p.a. on the space X TU(m) = log |A]. (12) 
As MBet, is a probability distribution on X, it is only (P3) Range: The range of TU(m) must be [0, log |X]. 
necessary to prove the following expression VB C X; (P4)  Subadditivity: Let m be a b.p.a. on the space X x Y, 
m** and m** its marginal b.p.a.s on X and Y respec- 
Belmix(B) < MBetm, (B) = ae, M Betm, (i). tively, then TU must satisfy the following inequality: 
a,€B 
st < TU(m'*) + TU(m"*). 1 
Without lost of generality, let B be a set in o(X) such that Cy SEO a PU) ) 
B = {21,%2,...,%,}; and Aj, Ag,--- , A; be all the focal sets (PS)  Additivity: Let m be a b.p.a. on the space X x Y, m!* 
of m, A; in e(X x Y), such that Al* CB. and m+ its marginal b.p.a.s on X and Y respectively 
We know that such that these marginal are not interactive (m(A x B) = 
_ m(Ar)iar € Ai) |, m(Adiler € Ar) oe ee 
M Betm, (%1) = == A if C4 Ax B), then TU must satisfy the equality: 
- TU(m) = TU(m'*) + TU(m**). (14) 
roe _ (Arie, € Ar), m(Adtlar € At) 
etm (Lr) = |Ay| Ter [Ai | +r These requirements pretend to extend those of Shannon’s 


with all a; > 0,7 = 1,..,r. We remark the following consider- 
ations in the above expressions of M Bet,,,.(a;),i = 1,..,1: 
- The values f(x; € A;) can be 0 if x; ¢ A; 
- The a; values come from the sets C € o(X x Y) such 
that 7; € C but C!* ¢ B. 


If we sum all the above expressions, we have that the first 
terms sums to m(Aj,), the second ones to m(A2), and so on 


entropy in probability theory. In DST there are two different 
types of uncertainty; one more than in classic PT. The require- 
ment of range is in some way debatable. In literature, one can 
find arguments in favor of a larger range. In PT never we can 
find a probability distribution that contains the information of 
other probability distribution. But, this situation can appear in 
DST: one b.p.a. can contains the information of another b.p.a., 
as we can see in the following example. 
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Example 1: In the analysis about the mortality of a traf- 
fic accident can be related to 3 main causes: (1) External 
causes (EC), i.e. poor visibility, bad road and/or atmospheric 
conditions; (2) overspeed (OS); (3) driver ill conditions due 
to consumption of harmful substances (D). Let assume the 
following two situations in the analysis of a particular accident. 

Situation]: we have 3 Evidences (£;, F2 and E3) on the 
causes of the mortality of an accident. An expert expresses 
his knowledge in form of the following b.p.a. on the universal 
X = {EC,OS,D}: 


Ey —_ mi({EC, OS}) = Bi, 
Ey» —_ m1 ({EC, D}) = Bo, 
E3 —> m1({OS, D}) = Bs, 


where 6; >0, 7 = 1,2,3,-and 57,8; = 1. 

Situation2: thinking better on the Evidences, now the expert 
finds that there exist some reasons to not discard D in the 
Evidence £}, i.e. a lost in information is produced. Hence, he 
thinks that he must to change his b.p.a to the following one: 


Ey — m2({EC, OS, D}) = bi, 
FE» —_ m2({EC, D}) = Bo, 
E3 —> m2({OS, D}) = Bs. 


It is clear that Situation2 represents a situation with a greater 
level of uncertainty than Situation. Here, we have Belz(A) < 
Bel,(A) and Pl,(A) < Pl2(A),VA C X, implying a bigger 
level of uncertainty for mg. 

Example 1 expresses a situation that must be taken into 
account for a total uncertainty measure in DST. Hence, for a 
TU in DST is necessary to verify the following property: 

(P6) Monotonicity: A total uncertainty measure in DST 
must take into account the decreasing or increasing in 
information. 

Formally, let 2 b.p.a.s be on a finite set X, mj, and mo, 
verifying that Bel2(A) < Bel,(A),VA C X, then it 
must be verified that: 


TU(m1) < TU(mza). (15) 


The above definition of monotonicity is called weak inclusion. 

Using the results of Klir and Wierman [18], Jousselme 
et al. [15], Klir and Lewis [19], Abellan and Masegosa [7] 
and Shahpari and Seyedin [22], we have the initial following 
properties for S*, AM and MAM functions: 


S*: Pl, P2, P3, P4, PS and P6. 
AM: PI, P2, P3 and PS. 
MAM: PI, P2, P3, P4~ and P5. 


S* is the only measure that satisfies all the proposed prop- 
erties. We mark P4 for MAM as P47 because that property is 
satisfies under a controversial way as we explain below. 

A. Drawbacks of the AM measure 


In Jousselme et al. it is showed that AJM function satisfies 
P4 requirement. In Klir and Lewis [19] it is showed that there 


exists an error in their proof about the P4 requirement, and 
the AM function does not verifies that property. 

Abelldn and Masegosa [7] proved that also AM function 
does not verifies P6 property. Extended details can be found 
in those two references. 


B. Drawbacks of the MAM measure 


MAM measure coincides on 1-dimensional space with AM 
measure, then the example provided in Abellan and Masegosa 
[7] about the non-monotonicity for AM is valid for MAM. 
Consequently, MAM does not verifies P6 property. Moreover, 
if we consider Example I with values 3, = 0.3, 62 = 0.5 and 
83 = 0.2, we have the following values for MAM: 


1.049 = MAM(m2) < MAM(m,) = 1.081 


These values imply that decreasing the real information, MAM 
can give us lower values of uncertainty. This drawback must 
not be allowed for such a type of measure. 

MAM measure satisfies the Property P4 in a controversial 
way, as discussed below. 

When looking for a measure that verifies P4 property, the 
definition of MAM function is not mathematically totally 
correct as eluded in the previous section. The authors says 
that “In cases where the projection process is not used, the 
modified pignistic probability is computed employing (13)’, 
where “(13)” is the equation of the AM function (Eq. (8)). 

It means that on (1-dimensional) m+* one would use the 
AM function but when the pignistic probability comes from a 
join b.p.a then one must used Eq. (9). That is questionable. 

Let discuss this question by the following example: 

Example 2: Let m, be a b.p.a. on space X = {21,272,123}, 
and masses 


m4 ({21, x2}) = 0.6, mi({x3}) = 0.1, m1(X) = 0.3 


Then MAM quantifies the information of ™m, as the entropy, 
S, of the probability distribution (0.4, 0.4, 0.2). 

On the other hand, if we consider the space Y = {y1, yo} 
and m on X x Y with masses 


m({211, 212, 221}) = 0.6, m({z31, 232}) = 0.1, 
m(X x Y) = 0.3, 


where z;; = (xi, y,;). Here the MAM function quantifies the 
information of m on X as $(0.5,0.3,0.2). But, in this case, 
m** = m,. This incongruence in the definition of MAM can 
be arranged by a different definition of marginal as described 
below. 

Shahpari and Seyedin [22] do not use the standard definition 
of marginal of a b.p.a. and its properties, with the aim that 
MAM function verifies the property P4. They do not use 
the standard definition of subadditivity for a measure M 
where it must be verified that, being m a b.p.a. on X x Y 
M(m) < M(m‘*) + M(m!”), having correct definitions 
of marginal b.p.a.s. As said in Shahpari and Seyedin [22]: 
“some probabilistic information is lost in the classic DST 
projection process”. One avenue would have been to change 
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the standard definition of marginal in DST. The standard 
definition of marginal is an extension of the one used in 
PT. Shahpari and Seyedin choose instead to define a different 
*pignistic’ probability distribution on the marginal b.p.a.s on 
X and Y. It is clear that Betm(mx) € K,,1x, but that 
MBetm(mx) € K,1x has not been verified in the work 
of Shahpari and Seyedin. We prove that assertion in Section 
Il. 

Suppose that we have a general b.p.a. m on the space X x 
Y xT. If we need to work on X x Y space, we use the standard 
marginal on X x Y, m!*Y. But to use the ’pignistic’ values 
in this case, should we use M Betmxy or Betmxy?, 1.e. the 
one from m or the direct one from m!**. If we need to use 
the similar values on X or Y, should we use the values from 
m or the values from m‘*¥ ?, These questions appear because 
of the controversial definition of MAM. 

From Shahpari and Seyedin [22], with S the entropy func- 
tion, we know that: 


S(MBetm) < S(MBetmxy)+S5(MBetmr), (16) 


S(M Betm) < 
S(MBetmx) + S(MBetmy) + S(MBetmr). 


But how is S(MBetmxy) with respect to S(MBetmx) + 
S(M Betmy)?. It should be checked that similar inequality is 
verified. 

How is the relation between M/Betmx, obtained from 
m and the similar one obtained from m!**?. They can be 
different. 

All questions raised above will not appear if, for instance: 


(17) 


- A new definition of the marginal of a b.p.a., compatible 
with PT and the definition of MAM function, is used. 

- The definition MAM on each b.p.a. is independent from 
its origin, i.e it should not be taken into account if it 
comes from a joint b.p.a. or not. 


The problem of the marginal found by Shahpari and Seyedin 
is an intrinsic one of the DST as extension of PT, because on 
a joint space X x Y has sense that a b.p.a has as focal element 
a set C, with C4 Ax B, for any AC X and BCY. 

Hence, under our point of view, the definition of the function 
MAM has some drawbacks but the critical one is that it does 
not verify the monotonicity property P6 and consequently, 
limits considerably its usage in applications. 


IV. BEHAVIORAL REQUIREMENTS FOR TOTAL 
UNCERTAINTY MEASURES IN DST 


Jousselme et al. [15] discuss some shortcomings of S* 
function (upper entropy) in DST and provide a comparison 
with the behavior of the AM function. These shortcomings 
have been presented by Klir and collaborators in the literature 
and can be expressed of the following way: 


(1) The measure must be computable directly or via algo- 
rithms. 


(2) The measure must be separated coherently in the two 
types of uncertainty coexisting in DST: conflict and non- 
specificity. 

(3) The measure must be sensitivity to changes of evidence. 

These considerations are analyzed in Abellan and Masegosa 
[7] adding for (3) that a TU should be sensitive to changes 
in evidence directly or via its parts. They found that it is 
possible that an increment of conflict causes a decrease in 
non-specificity and vice versa. It is shown that we can have 
similar total uncertainty value with different conflict and non- 
specificity parts. Hence, Abell4n and Masegosa [7] showed 
that a set of Behavioral Requirements (BR) for a TU in DST 
could be exposed in the following way: 

(BR1) A TU should have a direct calculus or via an 
algorithm. 

(BR2) A TU must not conceal the two types of uncer- 
tainty coexisting in the evidence theory: conflict and non- 
specificity. 

(BR3) A TU must be sensitive in changes of evidence, 
directly or via its parts of conflict and non-specificity. 

Based on the “Generalized Information theory”, of Klir 
[16], we know that there exists situations where the infor- 
mation can be expressed by more general models. Hence, 
the Principle of uncertainty invariance expresses that “the 
amount of uncertainty (and information) must be preserved 
when a representation of uncertainty in one mathematical 
theory is transformed into its counterpart in another theory”. 
Via this principle, Abellan and Masegosa [7], considered other 
requirement for a TU in DST, that one can called Extensibility: 

(BR4) The extension of a TU in DST on more general 
theories must be possible. 

The above requirements (BR1-BR4) have been analyzed in 
Abelldn and Masegosa [7] for S* and AM functions. Here, 
we verify for MAM, comparing with AM and S*. As these 
requirements do not need the use on a joint space, the case of 
MAM coincides with AM: 

BRI: AM and MAM functions have a simpler calcula- 
tion than the S*, it is only necessary to obtain the pig- 
nistic probability distribution of a b.p.a. The calculation 
of S* in DST has a high computational complexity. The 
algorithm of Meyerowitz et al [21] was the first to obtain 
this value. Recently, the computation of this algorithm 
has been reduced by Liu et al. [20]. Hence, we could 
conclude that the calculation of all TU in DST can be 
implemented in a simple way, though they have different 
complexity. 

BR2: Abellan, Klir and Moral [6] separate S* in two 
parts: S, (minimum of entropy) as a conflict measure 
and S* — S,, as a non-specificity measure. In Abellan 
and Moral [4], it is shown a branch and bound algorithm 
to get S, on DST and more general theories. AM 
and AM functions do not present a clear separation 
between conflict and non-specificity. In Jousselme et 
al. [15], AM is presented as a special case of the 
function 6.S* + (1 — 6)/, for an unknown 6 € (0,1). So, 
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using Ai/(m) value, one can not know which quantity 
corresponds to conflict and which one to non-specificity. 
BR3: In Abellan and Masegosa [7], we see that the 
problem on the sensitivity to changes of evidence of S*, 
raised in Klir and Smith [17], is not totally justified if 
we consider its parts. In the examples shown in Klir and 
Smith [17], an increasing in the conflict part produces a 
decreasing in the non-specificity part and viceversa. In 
Jousselme et al. [15], we see that this problem does not 
appears for AM (and then for 7AM), but we do not 
know what happen with the variations of conflict or non- 
specificity in those measures. 

BR4: Using the results of Abellan and Moral [1], [2] and 
in Abellan, Klir and Moral [6], S* can be extended to 
more general theories than DST, verifying similar sets of 
properties. In addition, On these theories exist algorithms 
to obtain S* and S,, as we can see in Abellan and Moral 
[4], [5]. The extensions of AM and MAM functions 
on more general theories is still an open question. One 
possibility for this extension could be to use the Mébious 
transform as for J function (see Abellan and Moral [1]), 
though its calculation would be more complex. As we can 
see in Abellan, Klir and Moral [6] that the generalization 
of some uncertainty measures defined in DST could have 
many problems when they are extended on more general 
theories. That is the case of EL measure in DST. The 
question here might be: is it worthy if property P6 is not 
met? 

In addition, Abellan [8] showed, via an example, that in 
situations where there is no conflict, S* could present some 
questionable behavior that could be considered as not totally 
correct. In these situations, S,, that quantifies the part of 
conflict, is equal to 0. One could think that intuitively S* 
does not reflect a proper behavior. In that example, 2 b.p.a.s 
with a clear difference of information are used: m, and mo, 
with Kin, C Km, and S*(m,) = S*(mz). 

If we apply AM or MAM to similar situations, we get an 
ill behavior for those measures. The monotonicity property P6 
is broken as illustrated by the following example. 

Example 3: Let m1 and mg the following b.p.a.s on a finite 
set X = {#1, %2,23}, with values: 


my({21,73}) = 0.5, mi ({x2, 73}) = 0.5 
m2({x2, 23}) = 0.5, mo({x1, £2, 23}) = 0.5 


It can be checked that Ky, C Km, (Belo(A) < 
Bel,(A),VA C X) and a clear decreasing in information 
appears when we pass from m1 to m2. 

Example 3 presents a case where there is no conflict among 


their focal sets in both b.p.a.s. In this example S*(m 1) = 
S*(mg); but 


1.040 = AM(m,) > AM(mz) = 1.028 


and similar situation appears for MAM. It clearly shows again 
an incorrect behavior for the measures based on the pignistic 
probability distribution. 


V. CONCLUSIONS 


This paper discussed drawbacks of total uncertainty (TU) 
measures drafted in DempsterShafer Theory (DST) and based 
upon the pignistic transformation. Those measures have been 
labelled as (AM, MAM) in this paper. The first step was 
to recall the basic requirements that those measures should 
meet. Previous work presented analysis about shortcomings 
of upper entropy which is the only measure that meet those 
basic requirements within DST. Previous work also added 
some desirable behavioral requirements as well as an im- 
portant property to the basic set: monotonicity. Both AM 
and MAM fail that critical requirement. In addition, we have 
demonstrated that AM and MAM defined as they are in 
Jousselme et al. [15] and Shahpari and Seyedin [22], will 
produce incorrect results and undesirable behaviors if used in 
applications. Some drawbacks have been corrected but several 
questions raised, for instance, in Section HI would require 
more extensive work not included here. We provided some 
hints to solve some issues but the main problem is that both 
AM and MAM do not meet the important requirement of 
monotonicity that limit considerably the interest towards their 
operational applications. 
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Abstract—The evidence combination is a kind of decision- 
level information fusion in the theory of belief functions. Given 
two basic belief assignments (BBAs) originated from different 
sources, one can combine them using some combination rule, 
e.g., Dempster’s rule to expect a better decision result. If one 
only has a combined BBA, how to determine the original two 
BBAs to combine? This can be considered as a defusion of 
information. This is useful, e.g., one can analyze the difference 
or dissimilarity between two different information sources based 
on the BBAs obtained using evidence decombination. Therefore, 
in this paper, we research on such a defusion in the theory of 
belief functions. We find that it is a well-posed problem if one 
original BBA and the combined BBA are both available, and it 
is an under-determined problem if both BBAs to combine are 
unknown. We propose an optimization-based approach for the 
evidence decombination according to the criteria of divergence 
maximization. Numerical examples are provided illustrate and 
verify our proposed decombination approach, which is expected 
to be used in applications such the difference analysis between 
information sources in information fusion systems when the 
original BBAs are discarded, and performance evaluation of 
combination rules. 

Index Terms—information fusion, decombination, belief func- 
tions, combination, divergence maximization 


I. INTRODUCTION 


The theory of belief functions, which is also known as the 
Dempster-Shafer evidence theory [1], has been widely used 
in many information fusion based applications including the 
pattern classification [2], [3], multi criteria decision making 
(MCDM) [4], fault diagnosis [5] and image processing [6]. 

The information fusion in the theory of belief functions is 
implemented by evidence combination based on some combi- 
nation rule, e.g., the well-known Demspter’s rule. There have 
also emerged various alternative combination rules including 
Yager’s rule [7], Dubois & Prade’s rule [8], Smets’ rule [9], 
Murphy’s rule [10], Florea’s rule [11], proportional conflict 
redistribution 5 (PCR5), and PCR6 [12], [13], etc. 

The inverse process of the information fusion, which can 
also be called as information ‘“defusion” or “decombination’’, 
is also meaningful in information processing and analysis. 
Like the blind source separation (BSS) [14] and independent 


This work was supported by the National Natural Science Foundation 
(No. 61573275, No. 61671370), Postdoctoral Science Foundation of China 
(No. 2016M592790), Postdoctoral Science Research Foundation of Shaanxi 
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component analysis [15], which aim to recover independent 
sources given only the observations that are unknown linear 
mixtures of the unobserved independent source signals, can 
be considered as a process of information decombination. One 
can analyze the original information sources and judge their re- 
lationship based on the results obtained using decombination. 
The community of belief functions theory seldom research 
on the information decombination problem, which means that 
given a combined BBA, how to determine the original BBAs 
for the combination. In Smets’ work [16], the concept of 
decomposition of evidence was proposed, which focuses on 
decomposing any BBA (not always assumed to a combined 
BBA) into many simple support function of BBAs. He also 
proposed the inverse operation of evidence combination, which 
only focus on the following case: given a combined BBA and 
one BBA participating the combination, how to restore another 
BBA participating the combination. In this paper, we focus 
on the information decombination (separation) or evidence 
decombination in the theory of belief functions. For simplicity, 
here we only concern the evidence decombination for two 
information sources. We find that given the combined BBA 
together with one original BBA, it is well-posed, that is, the 
other BBA can be uniquely determined. However, it turns out 
to be an under-determined problem (with multiple solutions) 
if both BBAs participating the combination are unknown and 
the combined BBA is given. The optimization (maximization) 
based decombination method is proposed accordingly, where 
the objective function is the distance between the two orig- 
inal BBAs (unknown variables to determine). Examples and 
experiments are provided to illustrate and verify our proposed 
information decombination method for the belief function. 


II. BASICS OF BELIEF FUNCTIONS THEORY 


The basic concept in the theory of belief functions [1] is the 
frame of discernment (FOD), which is determined by what 
we want to know and what we know. Elements in an FOD 
are mutually exclusive and exhaustive. m : 2° — [0,1] is 
defined as a basic belief assignment (BBA, also called a mass 
function) defined on the FOD © satisfying 


Doaee m(A) = 1, m(0) =0 (1) 
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where 2° denotes the powerset of ©. if Vm(A) > 0, then A 
is called a focal element of m/(-). If a BBA only has singleton 
focal elements, then it is called a Bayesian BBA. 

Given a BBA m(.-), its corresponding belief function (Bel) 
and plausibility function (Pl) are respectively defined as 


Bel(A) = ee m(B) (2) 


PI(A) = Doma m(B) (3) 


The belief Bel(A) represents the justified specific support for 
the focal element (or proposition) A, while the plausibility 
PI(A) represents the potential specific support for A. The 
length of the belief interval [Bel(A), Pl(A)] represents the 
imprecision degree of A. 

The evidence combination is the fusion of the BBAs 
originated from different sources. Two independent BBAs 
my(-) and mg2(-) can be combined using Dempster’s rule of 
combination [1] defined by 


0, A=0 
dpe Ae 
1— > mapa Y 


A,NB;=0 


(4) 


Dempster’s rule in general can be considered as a multiplica- 
tive and conjunctive fusion rule. Dempster’s rule of combi- 
nation has been criticized for its counter-intuitive behaviors 
[17], [18], especially in high conflict cases. Many alternative 
combination rules have been proposed accordingly. See [12], 
[19], [20] for details. Other researchers like Haenni [21] think 
that the conflict results from a fault in the framing of problem. 
Distance of evidence is for measuring the dissimilarity 
between BBAs. The most commonly used and strict distance 
of evidence is Jousselme’s distance [22] defined as follows. 


dj(m1, mz) = \/0.5- (my — m2)! Jac (m1 — mz) (5) 


where the elements Jac(A, B) of Jaccard’s weighting matrix 
Jac are defined as 


Jac(A, B) =|AN B\/|AUB| (6) 


Here A, B are focal elements of my, and mz, respectively. 
Jaccard’s matrix has been proved to be positive-definite [23], 
therefore, Jousselme’s distance is a strict metric satisfying 
four requirements of the distance metric including the non- 
negativity, non-degeneracy, symmetry, and triangular inequal- 
ity. 


II]. EVIDENCE DECOMBINATION IN BELIEF FUNCTIONS 
THEORY 


The evidence combination can be considered as a procedure 
of information fusion! as shown in Fig. 1. 


‘or information compression because from two BBAs we get one. 


m, (>) 


Evidence 
Combination 


my(:) 


Fig. 1. Evidence combination - Information Fusion. 


Given a BBA obtained after the combination, if one wants to 
know the possible original BBAs, then the evidence decombi- 
nation is needed, which can be considered as a procedure of 
information decombination as shown in Fig. 2. 


Mg () 


Evidence 
Decombination 


m(:) vw 


mp (+) 


Fig. 2. Evidence decombination - Information Decombination or “Defusion”. 


In this paper, we focus on determining the original BBAs given 
a combined BBA. First, we analyze the relationship between 
the combined BBA and the original ones. For simplicity, we 
only suppose that there are two original BBAs in this paper. 


A. Relation between Combined BBA and Original Ones ac- 
cording to Dempster’s Rule 


According to the Dempster’s rule in Eq.(4), one can obtain 
the following equations. Suppose that mj (-) and ma(-) are 
two BBAs defined on the FOD 0 = {6,...,0n}. For each 
BBA, there are at most 2” — 1 focal elements as shown below. 


(01) {01} 
{02} {82} 
{91, 02} {1,02} 
6 6 


Define a matrix R) for each k = 1,...,2" — 1 where 


(k)(; 3) — 1, if Cy = Ain B; 
a wa={ 9 if Cyt Ain By 7) 


where A; is the focal element of m(-), and where B; is 
the focal element of m2(-). The combined BBA is m(-) = 
mi(-)® ma(-), and Cy is the focal element of m(-). Note that 
i,j,k =1,...,2” —1. According to Dempster’s rule, the mass 
assignment of focal element C, in the combined BBA is 


mi({i}) ]~ mo({01}) 
m4 ({92}) ie m2({92}) 
: R®) ; 
maga a) —= oe (8) 
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where k = 1,...,2"—1 and K = AB my (A;)me2(B;) 
denotes the conflict coefficient. For simplicity in the sequel, 
we denote the mass value vector as 


mien [my 
mie | A 
m, = My 3 ,m2 = m2 3 
m1 ({1, 43}) m2({61, 43}) 
m4 ({62, 43}) m2({O2, 43}) 
m,(0) m2(9) 
Then, Eq. (8) can be rewritten as 
mi R® mz 
m(Cr) = i- kK 


1) Case I: In this case, the combined BBA m(-) is avail- 
able, and both original BBAs are unknown. That is, m(A;) 
(i =1,...,2”—1) and mo(B;) Gj = 1,...,2"—1) are unknown 
variables to determine, then the quantity of the unknown 
variable is 2” — 1 x 2 = 2+! _ 2. For the BBA, there exists 

oni 

S° mi(Aj) =1 (9) 

i=1 

2"-1 

$= mo(B;) =1 

j=l 
Considering Eqs. (8)-(10), we have 2” —1+2 = 2+ 1 
simultaneous equations. As aforementioned, to determine all 
the mass values of mj4(-) and mo(-), we have 2”! — 2 
unknown variables. That is, the quantity of the unknown 
variables is larger than that of the equations. Therefore, this 
is an under-determined problem with multiple solutions in 
general. 

2) Case II: In this case, the combined BBA m/(-) and one 
original BBA (e.g., ™1(-)) are available, while another original 
BBA (e.g., mo(-)) is unknown. To determine m2(-), we have 
2” — 1 unknown variables. By considering Eqs. (8) and (10), 
we have 2” simultaneous equations. That is, the quantity of the 
unknown variables is less than that of the equations. Therefore, 
this is an over-determined problem, and then mg(-) can be 
determined uniquely. 


(10) 


B. Optimization Based Evidence Decombination 


As aforementioned, given a combined BBA, to determine 
the two original BBAs is an under-determined problem, for 
which, the optimization-based approach is feasible. Then, the 
key issue is to select an appropriate criterion to establish the 
objective function for the optimization. 

In fact, the evidence decombination is like the blind source 
separation (BSS), where the divergence between different 
sources are used for the optimization based source sepa- 
ration, e.g, minimization of the mutual information (MMI) 
[24], which represents the largest divergence. Therefore, in 
this paper, we use for reference the criterion in BSS to 
design the objective function in optimization based evidence 
decombination. Here we use the distance of evidence to 


describe the divergence between BBAs. Furthermore, we use 
the simultaneous equations including the Eqs (8)-(10) together 
with inequalities (to assure a legal BBA? with the mass value 
lies in [0,1]) as the constraints for the distance maximization 
to implement the evidence decombination as illustrated in 
Eq. (11). 


max dj(m1,m2) = \/0.5- (mj, — mz)* Jac (m1 — m2) 


m1,m2 


2h 1 
ees 


° 


(11) 
By solving? the constrained maximization problem in Eq. (11), 
one can obtain a pair of BBAs that are farthest to each 
other, and that provide the combined BBA when fusioned with 
Dempster’s rule. 


IV. NUMERICAL EXAMPLES OF EVIDENCE 
DECOMBINATION BASED ON OPTIMIZATION 
In this section we give different examples illustrating how 
BBAs decombination can be obtained based on optimization 
of evidence decombination. 


A. Example 1 
Suppose that the FOD is {01, 02,03}. A BBA obtained after 
the combination of two unknown BBAs is 
m({O1}) = 0.1, m({O2}) = 0.2, m({A1, 62}) = 0.1, 
m({63}) = 0.1, m({@1, 03}) = 0.1, 
m({62, 63}) = 0.3, m(O) = 0.1. 


The equality constraints for the maximization problem include 


my ({61}) T mo ({01}) 
mmr (t07 89 }) ma(t@s,22}) 
ey, 1°92 m2 1°92 
m1 ({83}) RO | modes 
my ({61, 93 }) m2({01, 93 }) 
ee epee 
m({6,}) =01=—"* ow 
({.}) — 
where 
1 0 1 0 1 0 bi 
0 0 0 0 0 0 0 
(1) 1 0 0 0 A. 0 ie) 
R = o 0 0 0 0 60 0 
1 0 1 0 0 0 0 
0 0 0 0 0 0 0 
1 0 0 0 0 0 0 
It can be rewritten to a simpler form as 
T RQ) 
my, R mo) 
m({6,}) = 0.1 = ————— 
({61}) —- 
For other focal elements, 
T R(2) 
my, R mo) 
m({@2}) = 0.2 = ——__— 
1-k 


2to obtain admissible BBAs with values in [0,1] and their sum equals to 
one. 
3Here we use the global optimization toolbox in Matlab™. 
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where 
0 0 0 0 0 0 0 
0 1 1 0 01 21 
0 1 0 0 0 1 0 
R® = 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
O. “Ae MH 05 FOF? O78 
0 1 0 0 0 0 0 
T R(B3) 
my, R m2 
m 01, A =0.1= 
({01,62}) a 
where 
0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
(3) OO st 40 Oo Oo 4 
R = 0 0 0 60 0 0 0 
0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
OF VO. “HL OS “OF 207" <0 
T R(A) 
my, R mo) 
m({03}) = 0.1 = 
1-K 
where 
0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
RY — OF O10; ae ha) Ty 
OO 40 1 0 1 Oo 
O° O.%0r, a. a. 0-0 
0 0 0 1 0 0 0 
T p(d) 
my, R mo 
m 01, 03 =0.1= 
({01,6s}) 2 
where 
0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
(5) 0 0 0 0 0 0 0 
R = 0 0 0 0 0 0 0 
0 O40. FO 4. ZO 
0 0 0 0 0 0 0 
0 0 0 0 1 0 0 
T R(6) 
my, R m9) 
m({O2, 63}) =0.3= 
1-K 
where 
0 0 0 60 0 0 0 
0 0 0 0 0 0 0 
6) 0 0 0 0 0 0 0 
R' = 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
0 0 0 0 0 1 1 
G0 0 0 “0 1 0 
m2 RO ms 
m(®) =0.1= 
1-K 
where 
0 0 0 60 0 0 0 
0 0 0 0 0 0 0 
(7) 0 0 0 0 0 0 0 
R = 0 0 0 6 0 0 0 
0 0 0 60 0 0 0 
0 0 0 0 0 0 0 
0 0 0 0 0 0 1 


and the two equations in Eqs.(9) and Eqs.(10). The inequality 
constraints are 


0<mi({Or}) <1 
0 <mi({O2}) <1 
0 < ma({01, 62}) a 
0 < mi({3}) <1 
0 < mi({61, 03}) <1 
0 < mi({62,03}) <1 
0<mi(O) <1 

and 
0 <ma({6:}) <1 
0 < mo({O2}) <1 
0 < m2({01, 62}) <1 
0 < m2({03}) < ub 
0 < ma({O1, O3}) <1 
0 < mo({62,03}) <1 
0<m2(0) <1 


According to the constrained maximization in Eq. (11), one 
can obtain two BBAs as follows: 
ma({O1}) = 0, ma({O2}) = 0, ma({O1, 62}) = 0.0323, 
ma({03}) = 0.1612, ma({O1,03}) = 0.1612, 
Ma ({02, 03}) = 0.4840, ma(@) = 0.1613. 


and 


mo({01}) = 0.0834, ma({92}) = 0, mo({1, O2}) = 0.3666, 
my({43}) = 0, me ({O1, 03 }) = 0.0001, 
m({O2, 03}) = 0, my(O) = 0.5499. 


It is easy to verify that the combination result m,(-) 6 mo(-) 
is the same as the given BBA m(.-). 


B. Example 2 


Suppose that there are two BBAs defined on the FOD 0 = 
{h1, 92, 63}: 


m1 ({91}) = 0.6, m1({A2}) = 0.2, 

m4 ({o, 63}) = 0.1,m1(O) = 0.1. 
and 

mo({O1}) = 0.2, m2({A2}) = 0.6, 

m2({o, 63}) = 0.1, m2(O) = 0.1. 


By calculating the Jousselme’s distance in Eq. (5), one obtains 


that 
dz(m4,mz2) = 0.4. 


With Dempster’s rule of combination, one obtains that m(-) = 
my(-) @ ma(-) with 


m({01}) = 0.3846, m({02}) = 0.5385, 
m({02,03}) = 0.0577, m(@) = 0.0192. 


According to the evidence decombination approach in Eq (11), 
one obtains that 


mMa({01}) = 0, ma({62}) = 0.8750, 

ma({O2, 03}) = 0.0851, ma(O) = 0.0400. 
an mpl {61}) = 0.9399, ma({05}) = 0, 

mo({O2, 03+) = 0.0131, m,(@) = 0.0470. 


It is easy to verify that the combination result m,(-)®mp(-) = 
m(-), which is the same as m1(-) ® m2(-). 

By calculating the Jousselme’s distance given by Eq. (5), 
one can verify that 


dz(ma,Mp) = 0.9265 > dy(m1, m2) = 0.4. 
C. Example 3 
A given combined BBA is the same as that in Example 2. 


m({61}) = 0.3846, m({62}) = 0.5385, 
m({62, 03}) = 0.0577, m(O) = 0.0192. 
Moreover, suppose that we have additional information and 
we also know mz (-): 


my, ({61}) = 0.6, m1({62}) = 0.2, 
m4 ({6o, 63}) — 0.1, m1(0) = 0.1. 


Then, we try to use the BBA decombination to calculate 
the 772(-) and to check whether it is the same as mo(-) in 
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Example 2. Here is just the case II as aforementioned in Sect 
IlI.A. Therefore, 772(-) should be unique. So, there should 
exist m2(-) = 7he(-). It is an over-determined problem, and 
we can still use the optimization to solve 72(-) by modifying 
the optimization problem to 


maxd (m1, mg) = 0.5- (m4 = mz)! Jac (m4 = mg) 


m(Cy) = Bite 
2” -1 


S.t. > 


(12) 
where 


By solving Eq. (12), one obtains 


m2({O1}) = 0.2, ma({O2}) = 0.6, 
m2 ({O0, 03}) = 0.1, 72(@) = 0.1. 


That is, given a combined BBA and one original BBA, another 
original one can be determined uniquely. 


V. FURTHER ANALYSIS ON EVIDENCE DECOMBINATION 
A. Divergence Minimization or Maximization? 


In the evidence decombination shown in Eq. (11), distance 
maximization is adopted. This is inspired by the minimization 
of mutual information (i.e., the maximization of divergence) 
between sources in Blind Source Separation (BSS), which 
aims to bring out more independent components [24]. One 
can also try to implement the evidence decombination based 
on the distance minimization. Based on our analysis, we find 
that if the distance minimization is used, the minimum distance 
will be zero and the BBAs of two sources are identical. 

Suppose that m,(-) = mo(-) = mo(-), one can rewrite the 
constraints in Eq. (11) as 


m(C,) = Emo 
Day 


(13) 


where 


mo(®) 


As we see in Eq. (13), there are 2” — 1 unknown variables 
(mass values for 2” — 1 focal elements in mo(-)) to determine. 
There are 2” — 1+ 1 = 2” simultaneous equations in total. 
Therefore, if the solution exists, in general this is an over- 
determined problem which has the unique solution. 


Here we provide an example to verify this, where the 
combined BBA is still as chosen in Example 2, which is 


m({01}) = 0.3846, m({02}) = 0.5385, 
m({02, 3}) = 0.0577, m(®) = 0.0192. 


According to Eq. (11) and change the maximization to mini- 
mization, we obtain that m1(-) = me2(-) = mo(-), which is 


mo({01}) = 0.3877, mo({42}) = 0.3958, 


It is easy to verify that mo(-) ® mo(-) = m(-). 

We prefer the criterion of distance maximization, since it 
can bring out more distinct (likely to be more independent) 
evidences. 

Note that since we select the maximization, to assure to 
find the unique global optimal, the objective should be upper- 
convex. However, the objective function, i.e., the distance 
of evidence cannot satisfy this. Therefore, in our work in 
this paper, intelligent optimization algorithms [25] (e.g., the 
particle swarm algorithm and genetic algorithm) are adopted 
for the maximization to achieve a better solution. 


B. Possible Applications 


Note that given a combined BBA m(-), ma(-) and ms(-) 
after the evidence decombination. However, we do not know 
the specific correspondence between {ma(-), mo(-)} and 
{m(-), ma(-)}. That is, ma(-) could correspond to m,(-) or 
mg(-), and m,(-) could also correspond to mj,(-) or ma(-). 
Therefore, it cannot be used for analyzing or evaluating 
specific single sensor; however, the evidence decombination is 
expected to be used in applications like divergence evaluation 
between sensors, which is helpful for the sensor management. 
Given a BBA, if one can decombine it into two BBAs, then 
the maximum difference between corresponding information 
sources can be evaluated by calculating the distance between 
the two BBAs. 

Another possible application is the evaluation of different 
combination rules. Here, we only use the Dempster’s rule to 
construct the evidence decombination. In fact, other alternative 
combination rules can also be used for finding evidence 
decombination, where the difference between most of existing 
rules of combinations available in the literature lies in the 
choice of matrix R“™) in Eq. (7). Then, given a BBA, one 
can use different decombination methods corresponding to 
different combination rule to bring out different pairs of BBAs. 
One can calculate the distance between two BBAs in each pair 
to represent the aggregation capability of the corresponding 
combination rule. That is, an evidence decombination ap- 
proach can bring out a more divergent BBA pair, then the 
decombination method’s corresponding combination rule can 
aggregate (combine) a more divergent BBA pair to the same 
BBA compared with other rules. So we say that it has a better 
aggregation capability. 
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VI. CONCLUSIONS 


In this paper, an evidence decombination approach is pro- 
posed, where the distance maximization criterion is adopted 
in the evidence decombination. Some numerical examples and 
related analysis are provided to illustrate our proposed method 
and the possible applications are forecasted. 

In this paper, the distance of evidence used in the op- 
timization is Jousselme’s distance. In our future work, we 
will try other strict distance metric [26], [27] in the theory 
of belief functions for comparison. Currently, the objective 
function is the distance of evidence. In future work, we will 
try to use the difference between BBAs’ uncertainty measure 
values [28], [29]. Furthermore, we only consider two sources 
of evidence for the evidence decombination for simplicity. In 
our future work, we will try to design more sources (larger 
than two) for the evidence decombination. This paper is only 
a preliminary work on the evidence decombination, in future 
research work, we will try to apply the proposed method in 
various appropriate applications. 
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Abstract—To combine different types of uncertain information 
from different sources under different frameworks, we need 
transformations between different frameworks. For the trans- 
formation of a fuzzy membership function (FMF) into a basic 
belief assignment (BBA), several approaches have been proposed. 
Among these approaches, the uncertainty optimization based 
transformations can provide BBAs without predefining focal 
elements. However, these two transformations, which respectively 
use the uncertainty maximization and minimization criteria, 
emphasize the extreme cases of uncertainty. We expect to obtain 
a BBA, which is the trade-off between the two BBAs obtained 
by solving the uncertainty maximization and minimization, to 
avoid extreme attitudinal bias. In this paper, we propose two 
transformations of an FMF into a BBA by using a user-specified 
weighting factor to obtain such a trade-off (or balanced) BBA. 
Some examples and related analyses are provided to show the 
rationality and effectiveness of the proposed transformations. 

Index Terms—evidence theory, basic belief assignment, fuzzy 
membership function, optimization, transformation 


I. INTRODUCTION 


In the information fusion, we need to deal with a large 
amount of uncertain information. Various types of uncertainty 
theories have been proposed to deal with different types of 
uncertainty, e.g., the probability theory, fuzzy set theory [1], 
possibility theory [2], rough set theory [3] and Dempster- 
Shafer evidence theory (DST) [4] etc. When we fuse the 
information from different sources under different theoretical 
frameworks, we need the transformation between different 
frameworks. 

For the information represented by the FMF and BBA, we 
can transform an FMF into a BBA. Then, we can combine 
the BBAs to implement the information fusion. There have 
been proposed many transformations of an FMF into a BBA 
[5]-[9]. In [5], Bi et al. proposed a transformation that 
normalizes a given FMF to generate a BBA with singleton 
focal elements only. By using the a-cut approach, Florea et 
al. [6] transformed an FMF into a BBA with focal elements 
nested in order. However, these two approaches above have to 
predefine the focal elements, which lack of intuitiveness and 


This work was supported by the National Natural Science Foundation 
(Nos. 61573275, 61671370), Postdoctoral Science Foundation of China 
(No. 2016M592790), Postdoctoral Science Research Foundation of Shaanxi 
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objectiveness. Han et al. [7] proposed two approaches without 
predefining focal elements. These two approaches can provide 
BBAs by solving constrained uncertainty maximization and 
minimization. 

For the two transformations of Han et al. [7], both two 
objective functions are the ambiguity measure (A) and their 
constraints are mainly constructed based on the given FMF. 
Their rationality and effectiveness are both justified in [7]. 
During the process of solving optimization problems, these 
two transformations emphasize on the minimum and maximal 
uncertainties of the BBA, respectively. We think that the 
BBA being the trade-off (or balanced) between the two BBAs 
obtained by solving the uncertainty maximization and mini- 
mization is more preferred, which might avoid being “one- 
sided” on the uncertainty degree. In this paper, we propose 
two approaches by using a user-specified weighting factor to 
determine BBAs. One transformation is the weighted average 
by using the user-specified weighting factor with the two 
BBAs obtained by optimization based transformations [7]. The 
other transformation brings out a trade-off BBA by solving 
a constrained minimization problem. The objective function 
is based on the user-specified weighting factor, the distance 
of evidence and the two BBAs obtained by uncertainty opti- 
mization. The constraints are mainly based on the given FMF. 
That is, each of our proposed transformations can transform 
an FMF into a BBA, which can be considered as the trade-off 
between the two BBAs obtained with uncertainty optimization. 
Some examples and related analyses are provided to justify the 
proposed transformations. 


II. PRELIMINARY 
A. Basics of the Theory of Belief Functions 


The theory of belief functions [4], introduced historically 
by Shafer in DST, is a powerful framework for uncertainty 
modeling and reasoning. Let O = {6}, 02,...,0,} be the frame 
of discernment (FOD). Under the closed world assumption, 
the FOD is mutually exclusive and exhaustive. The BBA (also 
called a mass function) is defined on the power set of 0, which 
can be denoted by a function m : 2° -+ [0, 1] satisfying 


S> m(A) = 1,m(0) = 0 (1) 
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where () denotes the empty set. VA, if m(A) > 0, then A is 
called a focal element. m(A) denotes the evidence support to 
the proposition A. 

The belief function Bel for all A C O, as: 


Bel(A) = 5° m(B) (2) 
BCA 


The plausibility function Pl for all A C O, as: 
d= m(B) (3) 


ANB#AO 


Suppose there are two independent BBAs m, and mz on the 
same FOD. Historically Shafer proposed Dempster’s rule to 
combine two (or more) BBAs. Dempster’s rule of combination 
is 

0, A=9 
m(A) = {mi (B)ma(C) (4) 
BOC=A 
A#O 


1-K : 
where K = Do pqcugmi(B)m2(C) represents the conflict 


coefficient between two BBAs. There exist other alternative 
combination rules [10], [11]. 


B. Uncertainty Measure of a BBA 


The uncertainty of a BBA includes two types: the dis- 
cord and the non-specificity. Different measures of uncer- 
tainty [12]-[16] have been proposed, e.g., the non-specificity 
measure [14], the ambiguity measure (AM) [15] and the 
aggregated uncertainty (AU) [16]. The definition of AM is 


as follows: 
—S° Bet Pn, (8 
te) 


)log,(BetPm(0)) (5) 


where BetP,,(0) = 
probability [17]. 


Mocace MA)/|A| is the pignistic 
]. |A| denotes the cardinality of the set A. 


III. TRANSFORMATION OF FMF INTO BBA 
A. Concept of Fuzzy Set 


Fuzzy sets [1] were proposed by Zadeh to describe the 
concepts without precise definitions. Let © be the universe 
of discourse (equivalent to FOD in the belief functions). A 
fuzzy membership function is denoted by wu = p(6), 6 € O. 
For  : © — [0,1], w(@) € [0,1] is called the degree of 
membership for 6. 


B. Traditional Transformations of FMF into BBA 


a) Transformations with the predefinition of focal ele- 
ments: For a given FMF, two available types of transforma- 
tions below can provide a BBA, which have to predefine the 
focal elements. Suppose that the FOD is O = {6}, 2,..., An} 
and the given FMF is w = [1(61), (A2),...,4(An)]. The 
obtained BBA is represented by m. 

In the work of Bi et al. [5], the BBA is determined as 


follows: 
m({6;}) = u(8 )/ LHe) (6) 


This approach predefines all focal elements as singletons, and 
it is the result of normalization for the given FMF. 

Another transformation with the predefinition of focal el- 
ements is the work of Florea et al. [6] by using the a-cut 
approach. Suppose that ju(01), (02),..., W(A,) are sorted into 
ascending order as 0 = ag < ay < a2 <... < ayy < 1, where 
M < |O9|. The BBA is determined by using the transformation 
[6] as follows: 


m(A;) = ice (7) 

am 
where A; = {0; € Olu(A) > aj}, i = 1,2,..,n, 7 = 
1,2,..., MW. This transformation predefines the focal elements 


nested in order for the given FMF. 

Both two approaches can transform an FMF into a BBA. 
However, the transformations with the predefinition of focal 
elements lack of intuitiveness and objectiveness. For a given 
FMF, the optimization based transformations can obtain a BBA 
without predefining the focal elements. 

b) Transformations based on the uncertainty optimiza- 
tion: In the work of Han et al. [7], the two transformations 
that have no predefinition of focal elements are obtained 
by solving the uncertainty maximization and minimization. 
Suppose that the FOD is 0 = {61, 62,...,9,} and the given 
FMF is pw = [u(61), 4 (82), ..., u(@n)]. The obtained BBA is 
represented by m. 

There exists a relationship [18] between the FMF and the 
belief function or plausibility function. When 97>", (;) > 1, 
the FMF is equivalent to a singleton plausibility function, 
which is denoted by 

dS m(A) 


PU({Gi}) = = 1(9;),V {Ai} CO (8) 
{9 }N AAO 


It is the necessary and sufficient condition for the FMF to be 
a singleton plausibility function. 

When 5~""_, 4(8;) < 1, the FMF is equivalent to a singleton 
belief function, which is denoted by 


Bel({9i}) = D2 m(A) = 4(4),V{0} CO 9) 


AC{4:} 


Similarly, it is the necessary and sufficient condition for the 
FMF to be a singleton belief function. 

The detailed proof of the above relationships are given in 
[18]. 

There is a BBA transformed from a given FMF, and the 
FMF and BBA satisfy Eq. (8) or Eq. (9). Then, n linear 
equations for the corresponding relations can be obtained. In 
addition, one has 5° 4c@ m(A) = 1. There exist n + 1 linear 
equations. However, except for m(Q@) = 0, in the worst case 
there are 2” — 1 focal elements to assign the belief. The n+ 1 
linear equations with respect to 2” — 1 undetermined variables, 
which is an under-determined problem, i.e., it usually has 
multiple solutions. 

Therefore, to obtain a unique BBA, Han et al. [7] established 
two uncertainty optimization based transformations for the 
given FMF. Both two objective functions are AM and the 
constraints are mainly based on the given FMF. 
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The objective function of the uncertainty maximization 
problem and the corresponding constraints are as follows: 
When 57", u(0)) > 1, 


n 


max ) — So [Bet Pm (9;) logy( Bet Pm (0:))] 


dX -m(A) = u(G),V {Bi} SO 
{Oi }NAAO 
s.t. Yo m(A) =1 
ACE 
0<m(A) <1 


When Bae L(G) < 1, 
max 4 — )> [Bet P,(6;) logy (Bet Pim (9; ))] 
oe i=1 


7 mA) = H(i), V {Oi} CO 


(10) 


(11) 


In the sequel, this transformation is represented by “Tynax” 
for convenience. 

The objective function of the uncertainty minimization 
problem and the corresponding constraints are as follows: 
When 2", 1(0,) > 1, 


ef {- S [Bet Pn(6.) loga(BetPn(0))} 


~~ =—-mA) = 2(4), V {8} CO 


{0,10 AZO (12) 
s.t yo m(A) = 
ACO 
0<m(A) <1 
When Yo", (65) <1, 
min 4 — 3° [Bet Pn(6;) logy (Bet Pn (6:))] 
re i=1 
A) = p(0;),V {8;} CO 
acto ) = H(i), V {Oi} (13) 
s.t. Yo m(A) =1 
ACO 
0<m(A) <1 


In the sequel, this transformation is represented by “Tin” for 
convenience. 

The unique BBA can be determined without predefining 
focal elements by using “Tynax” or “Tin”. The obtained BBA 
is the optimal solution of the uncertainty maximization or 
minimization. During the process of transforming an FMF into 
a BBA, “Tynax” and “Tin” emphasize on the maximal and 
minimum uncertainty cases of the obtained BBA, respectively. 
We think that the BBA being the trade-off between the 
two BBAs obtained by using “Tyax” and “Tin” is more 
preferred, which might avoid bias in terms of uncertainty 
degree. 


IV. TRANSFORMATIONS WITH USER-SPECIFIED 
WEIGHTING FACTOR 


As aforementioned, we can obtain two BBAs by using 
“Tinax and “Tin”, respectively. Based on these two BBAs, 


we aim to construct a transformation to determine a trade- 
off BBA. The trade-off BBA which satisfies the relationship 
between the FMF and the singleton plausibility or singleton 
belief. We use a user-specified weighting factor to influence 
how close the trade-off BBA is to each of the two BBAs above. 
Suppose that the user-specified weighting factor is represented 
by a and 0 < a < 1. When a — 0, the trade-off BBA is close 
to the BBA obtained by using “Tyin”. When a — 1, the 
trade-off BBA is close to the BBA obtained by using “Tynax”. 
To meet the requirements above, we propose two different 
approaches to determine the trade-off BBAs. 


A. Weighted Average based Transformation 


Let O = {61,0,...,8n} be the FOD. The given FMF is 
represented by ps = [1u(A1), u(62), ...; W(On)]. Suppose that the 
BBA obtained by using “Twin” is denoted by mM n;n, and the 
BBA obtained by using “Tynax” is denoted by Mmaz. The 
user-specified weighting factor is denoted by a (0< a <1). 
The trade-off BBA is denoted by m. The Weighted Average 
of Mmin ANd Max Can bring out a trade-off BBA as follows: 

m(A) = (1—a@)-Mmin(A) + @: Mmaz(A) (14) 
where A C O. In the sequel, the transformation based on the 
weighted average (WA) is denoted by “Twa” for convenience. 

The BBA obtained in (14) is an admissible BBA and it 
satisfies the constraints established based on FMF. 

According to Eq. (14), the following conditions can be 
satisfied: 

S> m(A) = 1, 0< m(A) <1 


S < (15) 
ACO 


For the transformation of an FMF into a BBA, it is necessary 
that the obtained BBA satisfies the relationship between the 
FMF and the singleton plausibility or singleton belief. Al- 
though “Twa” is a simple and direct transformation of an 
FMF into a trade-off BBA, it also satisfies the relationship. 
The proof is provided below. 

When 57i_, “(9;) > 1, Mmin and Mmaxz satisfy Eq. (6), 
respectively, i.e., Plinin ({0:}) = Plmax ({@:}) = u(G:), t = 
1,2,...,n. According to Eq. (14), 


PI) = SD m(A) 
{O:}NAAO 


= So [(-a)-1nin(A)+0- Mra (A)] 
{0;}NAAD 


=(1-0)- $0 mnnA+a: S72 me (4) 16) 
ye alee ({9:}) + @:-Plmax ({9:}) 


) + (Oi) + > (Gi) 


Similarly, when $7""_, u(@:) <1, Mmin and Mmaz satisfy 
Eq. (7), respectively, i-e., Belmin ({0:}) = Belmax ({9;}) = 
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u(0;), 7 = 1,2,...,n. According to Eq. (14), 
Bel ({6;})= > m(A) 


AC{O;} 
= SO [(t-a) thin (A) + 0-14nae (A)] 
AC{Oi} 

(1—a): S© min(A)ta: S> tyne (A) 
AC{O;} ACO} 
= (1 — a): Bebnin ({9i}) +0 Bele ({9: }) 
= (1—a)- (6;) +a 1(6;) 
= 1(4;) 

For the trade-off BBA, when Sha u(O;) > 1, the given 
FMF is equivalent to the corresponding singleton plausibility. 
When 57i_, u(0;) < 1, the given FMF is equivalent to the 
corresponding singleton belief. That is, “Twa” can transform 


the given FMF into the trade-off BBA, which satisfies the 
relationship between the FMF and the BBA. 


(17) 


B. User-specified Optimization based Transformation 


Let the FOD be 0 = {6}, 02,...,0,}. The given FMF is 
denoted by ps = [u(61), 14(02), ..., u(An)]. Suppose that Mmin 
and Mmaz denote the BBAs obtained by “Tynin” and “Tmax”, 
respectively. The user-specified weighting factor is denoted by 
a (0 <a <1). The trade-off BBA is represented by m. 

The user-specified weighting factor is used to influence the 
similarity between the trade-off BBA and myjn (Of Mmax)- 
The degree of similarity between two BBAs is represented by 
the distance of evidence. We can use the Jousselme’s distance 
[19], which is a strict metric defined as 


dj(ma,mMp) = v/5lme — my)D(ma — Mp) 
where D(A,B) = |ANB|/|AUB|, AC O0,BCSO. 
According to Eq. (18), the obtained BBA is more similar to 
Mmin, if dz(m, Mmin) is smaller. If dy(m, Mmax) is smaller, 
the obtained BBA is more similar to Mypaz. 

To obtain the trade-off BBA between mMmjn and Mmax, a 
relationship between the user-specified weighting factor and 
the distance of evidence can be constructed. When a is given 
from 0 to 1, with the decreasing of dj(m,mmin), the value 
of dz(m,Mmax) is increasing. Then we can establish the 
following equation: 


(18) 


djz(m, Mmin) a a 


= 19 

dj(m,Mmar) 1-—a a?) 

The BBA satisfies Eq. (19) may not always exist. If the fol- 

lowing function (equivalent to Eq. (19)) achieves the minimum 
value, then the trade-off BBA is obtained. 


obj(m) = [(1 — a)-dz(m, Minin) — ad y(M, Mmac)]? (20) 


When a is given, we can establish a constrained minimiza- 
tion problem to transform an FMF into a BBA. The objective 
function is Eq. (20) and the constraints are mainly based on 


Eq. (6) or Eq. (7). The transformation of an FMF into a trade- 
off BBA is obtained by solving the user-specified optimization 
problem as follows: 

When 2", 1(6;) > 1, 


min {[(1 — a)-dy(1, min) = od (12, Myre) 
mA) = H(6i),V {8:} CO 
{Oi }NAAD 
s.t. > m(A)=1 
ACO 
0<m(A) <1 


When 5", (0) <1, 


min {[(1 — a)-dy(m, rmin) — 0-451, Mae) } 
do m(A) = 1(6;),V {8;} C © 
AC{9i} 
s.t. > m(A) =1 
ACO 
0<m(A) <1 


(21) 


(22) 


In the sequel, the transformation based on the user-specified 
optimization (USO) is denoted by “Tuso” for convenience. 

For a given FMF, the trade-off BBA can be obtained by 
using “Tyso”, which is a user-specified optimization based 
transformation. 


V. EXPERIMENTS 


In this section, we provide some examples to illustrate how 
to transform an FMF into a trade-off BBA using our approach- 
es. Here, we use the optimization toolbox in the Matlab™ to 
solve the optimization problems under constraints. 


A. Example 1 


Let the FOD be 0 = {06),62,03}. The given FMF is 
(01) = 0.9, u(@2) = 0.7, w(03) = 0.3. Suppose that 
Mmin and Mmaz are the BBAs obtained by using “Tyyin” and 
“Tinax, respectively. We just list the corresponding BBAs for 
a = 0,0.3,0.7 and 1. 

This FMF satisfies es u(0;) = 1.9 > 1. Therefore, the 
given FMF is equivalent to the singleton plausibility. The 
BBAs obtained by using “Twa” and “Tygo” are listed in 
the Table I and Table II, respectively. 

By using “Twa” and “Tyso”, when a = 0, the obtained 
BBAs are identical to m;n, and the values of AM are the 
minimum uncertainty. When a —> 0, the obtained BBA is 
similar to Myyjn, and its uncertainty is close to the minimum 
uncertainty. 

Similarly, when a = 1, the obtained BBAs are identical to 
Mmax, and the values of AM are the maximal uncertainty. 
When a — 1, the obtained BBA is similar to myq,, and its 
uncertainty is close to the maximal uncertainty. 


B. Example 2 


Let the FOD be © = {6}, 02, 03,604}. The given FMF is 
(01) = 1, w(@2) = 0.2, (03) = 0.3, (04) = 0.3. Suppose 
that Mmin and Mmaz are the BBAs obtained by using “Tin” 
and “Tynax”, respectively. We just list the corresponding BBAs 
for a = 0,0.2,0.8 and 1. 
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TABLE I 
USING “Twa” TO OBTAIN BBAS IN EXAMPLE 1. 


a BBA AM 
a m({0i}) = 0.3, m({1,02}) = 03 
Oe | anes 04, HO) 08 aie 
m({0,}) = 0.21, m({0,,03}) = 0.42 
a =0.3 | m({O2}) = 0.07, m({41,63}) = 0.06 | 1.4003 
m({03}) = 0.03, m(@) = 0.21 
m({01}) = 0.09, m({01,0a}) = 0.58 
a =0.7 | m({2}) = 0.03, m({01,63}) = 0.14 | 1.4730 
m({3}) = 0.07, m(®) = 0.09 
= m({1,02}) = 0.7, m({03}) = 0.1 
G24 | ig e262 1.5129 


TABLE II 
USING “Tygo” TO OBTAIN BBAS IN EXAMPLE 1. 
a BBA AM 
m({01}) = 0.3, m({O1, 02}) = 0.3 
= iy 
ay | aahOgh = 0-4. mal@y aad 3 ae 
m({61}) = 0.2027, m({6i, 2}) = 0.4145 
a =0.3 | m({82}) = 0.0828, m({61, 63}) = 0.0802 | 1.3969 
m({03}) = 0.0172, m(@) = 0.2026 
m({01}) = 0.1037, m({01, 02}) = 0.5962 
a =0.7 | m({82}) = 0.0001, m({61,63}) = 0.0965 | 1.4828 
m({63}) = 0.0999, m(@) = 0.1036 
m({81, 82}) = 0.7, m({d3}) = 0.1 
=1 1.512 
s m({61,63}) = 0.2 nee 


According to Sear 1(0;) = 1.8 > 1, the FMF is equivalent 
to the singleton plausibility. In the Table III and Table IV, 
the BBAs obtained by using “Twa” and “Tyso” are listed, 
respectively. 

When a = 0, the obtained BBAs are identical to Mmjn, and 
the values of AM are the minimum uncertainty. When a = 1, 
the obtained BBAs are identical to Maz, and the values of 
AM are the maximal uncertainty. 

In the Table HI and Table IV, when a -> 0, the obtained 
BBA is similar to Min, and its uncertainty is close to the 
minimum uncertainty. When a — 1, the obtained BBA is 
similar tO Mmqz, and its uncertainty is close to the maximal 
uncertainty. 


TABLE III 
USING “Twa” TO OBTAIN BBAS IN EXAMPLE 2. 


a BBA AM 
= m({61}) = 0.7, m(O) = 0.2 
ON Mees bata sO ag 
m({0,}) = 0.60, m({0,,02}) = 0.04 
a =0.2 | m({61,03}) = 0.06, m({6;,64}) = 0.06 | 1.2099 
m({61, 83, 04}) = 0.08, m(@) = 0.16 
m({01}) = 0.3, m({01,02}) =0.16 
a=0.8 | m({01,63}) = 0.24, m({01, 04}) = 0.24 | 1.5122 
m({61,03,04}) = 0.02, m(@) = 0.04 
= m({0,}) = 0.2, m({01,00}) = 0.2 
ed ll eal Ov Oi) 05: moO) Oe i ee 


TABLE IV 
USING “Tygo” TO OBTAIN BBAS IN EXAMPLE 2. 


a BBA 

({01}) = 0.7, m(©) = 0.2 
({61, 03, 04}) = 0.1 
({61}) = 0.6008, m({61, A2}) = 0.0471 
({61, 93}) = 0.0484 
({61, 92, 3}) = 0.0037 
({61, 04}) = 0.0515 
({61, 92, 04}) = 0.0006 
( 

( 

( 

( 

( 

( 

( 


AM 


1.0896 


1.2133 


33333 3/3 3 


{01,63,4}) = 0.0993, m(®) = 0.1486 
{01}) = 0.2990, m({01, 02}) = 0.1665 
{01,63}) = 0.2345, m({01, 04}) = 0.2010 
{01,02,04}) = 0.0335 

{61,03,04}) = 0.0655 

{01}) = 0.2, m({61, 02}) = 0.2 

{6;,03}) = 0.3, m({01,04}) = 0.3 


1.5227 


33/3 338 


1.5955 


C. Example 3 


Let the FOD be © = {6}, 02, 03,604}. The given FMF is 
(01) = 0.6, (02) = 0.1, p(03) = 0.2, po(04) = 0.1. We just 
list the corresponding BBAs for a = 0,0.4,0.9 and 1. 

This FMF satisfies Eran (0;) = 1. The FMF is equivalent 
to the singleton plausibility or singleton belief. In the Table 
V and Table VI, the BBAs obtained by using “Twa” and 
“Tyso” are listed, respectively. 

In the Table V and Table VI, the BBAs obtained when a = 0 
are identical to the BBAs obtained when a = 1, i.e., the two 
BBAs obtained by using “Tynin” and “Tyna,” are the same. 
Therefore, Va € [0,1], the obtained BBAs are without the 
influence of a. When a is given from 0 to 1, all the obtained 
BBAs are Bayesian belief functions and are identical. 


TABLE V 
USING “Twa” TO OBTAIN BBAS IN EXAMPLE 3. 


a BBA Tae 
ea era 
oe ee Sa 
ogee ee Soe a ee 1.5710 
Ee ea 

D. Example 4 


Let the FOD be 0 = {06),62,03}. The given FMF is 
1(0,) = 0.6, u(@,) = 0.2, (03) = 0.1. Suppose that 
Mmin and Mmaz are the BBAs obtained by using “Tynin” and 
“Tmax, respectively. We just list the corresponding BBAs for 
a = 0,0.3,0.8 and 1. 

This FMF satisfies oy 1(0;) = 0.9 < 1. The FMF is 
equivalent to the singleton belief. The BBAs obtained by using 
“Twa” and “Tyso” are listed in the Table VII and Table VIII, 
respectively. 
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TABLE VI 
USING “Tysgo” TO OBTAIN BBAS IN EXAMPLE 3. 


ec BBA AM 
a ae ee teed ee 
Pie ee | eno 
a0 | ea ee ae 
ee eee ea oem 


In the Table VII and Table VII, when a = 0, the obtained 
BBAs are identical to m;n, and the values of AM are the 
minimum uncertainty. When a = 1, the obtained BBAs are 
identical to Mmaz, and the values of AM are the maximal 
uncertainty. When a — 0, the obtained BBA is similar to 
Mmin, and its uncertainty is close to the minimum uncertainty. 
When a — 1, the obtained BBA is similar to Mmaz, and its 
uncertainty is close to the maximal uncertainty. 


TABLE VII 
USING “Twa” TO OBTAIN BBAS IN EXAMPLE 4. 


Qa BBA AM 
~ 9 | mA) = 0.6, m({Oa}) =O. 
= 9 | m({o}) = 0.2, m({O1,62}) =0.1 | “78% 
m({61}) = 0.6, m({61, 02}) = 0.07 
a=0.3 | m({62}) = 0.2, m({62,63}) = 0.03 | 1.2749 
a({bs}) 0.1 
m({91}) = 0.6, m({01,02}) = 0.02 
a=0.8 | m({62}) = 0.2, m({62,63}) = 0.08 | 1.3321 
m({3}) = 0.1 
—Tm({01}) = 0.6, m({Oa}) = 0.2 
RT| Gilead Ost an loa Ga) ae |e 


TABLE VIII 
USING “Tyusgo” TO OBTAIN BBAS IN EXAMPLE 4. 


rs BBA AM 
= m({01}) = 0.6, m({O3}) = 0.1 
0=0 | m({62}) = 0.2, m({61, 62}) = 0.1 iis 
m({01}) = 0.6, m({01,02}) = 0.0697 
a = 0.3 | m({2}) = 0.2, m({61,63}) = 0.0006 | 1.2748 
m({63}) = 0.1, m({62,03}) = 0.0297 
m({61}) = 0.6, m({01,02}) = 0.02 
a =0.8 | m({62}) = 0.2, m({62,63}) = 0.08 | 1.3321 
m({és\) =01 
= m({0}) = 0.6, m({0a}) = 0.2 
BE) lena Oa mids 05 =O eset 
E. Example 5 


Suppose that a system of classification with three sensors, 
including displacement sensor Si, pressure sensor S2 and 
image sensor 53. Let the FOD be O = {61, 62,03}. Three 
sensors are used for measuring the size, weight and state of the 
sample, respectively. The measurements of sensors are used to 


obtain two FMFs and a BBA. According to the parameters and 
the measurements of the sensor, the FMF is defined as 


xr-— Min; 


—, xe [min;, ave; 
ave; — min; 
w()=4 mare 
ave; — max,’ 
0, others 


(23) 


(ave;,max;| 


where 7 = 1,2. min; and maz, are the minimum and maximal 
values of the class 6; (j = 1,2,3), respectively. ave; is the 
average value of the class 6;. 


TABLE IX 
THE PARAMETERS AND THE MEASUREMENTS OF SENSORS. 


Class : St z S2 

Minty Mary, avel Ming2 Max2 ave2 
[ 43.3 58.4 50.8 2.9 4.1 3.4 
[02 50.9 70.1 59.5 2.0 3.4 2.8 
[ 63 49.4 79.3 65.7 2.2 3.8 2.9 
| Sample 56 3.2 


In the Table IX, the parameters of S; and Sy and the 
measurements of a sample are listed. The class of this sample 
is 62. According to (23), two FMFs are as follows: 

S, : u(0,) = 0.3158, (02) = 0.5930, u(63) = 0.4049; 
So : (01) = 0.6, (62) = 0.6667, u(@3) = 0.3333. 

According to the image of 53, the expert determined the 
BBA directly as follows: 
ms, ({03})=0.51, mg, ({2, 03})=0.38, mg, (O)=0.11. 


TABLE X 
USING “Twa” TO OBTAIN BBAS IN EXAMPLE 5. 
a ms, ™MS2 
ms, ({01}) = 0.0021 ms, ({01}) = 0.2667 
eh msg, ({02}) = 0.5930 Msp ({02}) = 0.4 
msg, ({03}) = 0.0912 msz ({01, 63}) = 0.0666 
ms, ({01, 93}) = 0.3137 | mg, ({O}) = 0.2667 
ms, ({01}) = 0.0759 ms, ({01}) = 0.1967 
msg, ({8@2}) = 0.4989 Msp ({O2}) = 0.3 
ms, ({01,42}) = 0.0203 | mg, ({01,92}) = 0.17 
a=0.3 | mg, ({03}) = 0.1115 ms, ({03}) = 0.09 
ms, ({01,43}) = 0.2196 | mg, ({01,03}) = 0.0466 
ms, ({02,43}) = 0.0738 | mg, ({62,43}) = 0.01 
ms, ({Q}) = 0.1867 
ms, ({01}) = 0.1989 ms, ({01}) = 0.08 
ms, ({92}) = 0.3420 msg, ({02}) = 0.1334 
ms, ({01, 02}) = 0.0542 | mg, ({01,02}) = 0.4534 
a =0.8 | mg, ({63}) = 0.1454 ms, ({03}) = 0.24 
ms, ({61,93}) = 0.0627 | mg, ({61,43}) = 0.0133 
ms, ({02,03}) = 0.1968 | msg, ({62,43}) = 0.0266 
msg, ({O}) = 0.0533 
ms, ({01}) = 0.2480 ms, ({01}) = 0.0333 
ms, ({02}) = 0.2793 ms, ({02}) = 0.0667 
a=1 | msg, ({61,42}) = 0.0687 | mg, ({01,02}) = 0.5667 
msg, ({03}) = 0.1590 Msp ({03}) = 0.3 
msg, ({02,03}) = 0.2459 | mg, ({02,63}) = 0.0333 


Suppose that mg, and mg, denote the obtained BBAs 
transformed from the two FMFs of 5; and So, respectively. 


744 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


TABLE XI 


USING ““Tyso” TO OBTAIN BBAS IN EXAMPLE 5. 


TABLE XIII 


THE PIGNISTIC PROBABILITIES IN EXAMPLE 5. 


a ms, MSz 
msg, ({91}) = 0.0021 Mg ({O1}) = 0.2667 
Ast msg, ({02}) = 0.5930 Msgy ({O2}) = 0.4 
msg, ({93}) = 0.0912 Msp ({91, 63}) = 0.0666 
msg, ({91, 93}) = 0.3137 | mg, ({O}) = 0.2667 
mg, ({01}) = 0.0354 mgp ({01}) = 0.1902 
msg, ({@2}) = 0.4927 Msp ({02}) = 0.2641 
msg, ({01, 62}) = 0.0670 | mg, ({01, 02}) = 0.2124 
a=0.3 | mg, ({63}) = 0.1654 ms, ({03}) = 0.0291 
msg, ({01,63}) = 0.2062 | mg, ({01, 63}) = 0.1140 
ms, ({02,63}) = 0.0260 | mg, ({02, 63}) = 0.1068 
msg, ({O}) = 0.0073 msg, ({O}) = 0.0834 
ms, ({1}) = 0.1918 Msp ({O1}) = 0.0942 
msg, ({82}) = 0.3190 Msp ({92}) = 0.1193 
ms, ({91, 92}) = 0.0844 | mg, ({61, 02}) = 0.4532 
a=0.8 | mg, ({03}) = 0.2152 Mp ({93}) = 0.2379 
msg, ({82,93}) = 0.15 Msp ({91, 63}) = 0.0012 
Ms, {©}) = 0.0396 MSo ({92, 03}) = 0.0428 
mg, ({O}) = 0.0514 
ms, {01}) = 0.2480 MS> ({61}) = 0.0333 
msg, ({02}) = 0.2793 Msp ({02}) = 0.0667 
a=1 msg, ({01, 02}) = 0.0687 | mg, ({01, 02}) = 0.5667 
msg, ({03}) = 0.1590 msgp ({03}) = 0.3 
msg, ({02,63}) = 0.2459 | mg, ({02, 03}) = 0.0333 
TABLE XII 
THE COMBINED BBAS IN EXAMPLE 5. 
a “Twa” “Tys0” 
m({61}) = 0.0276 m({61}) = 0.0276 
Pee m({62}) = 0.5731 m({62}) = 0.5731 
m({63}) = 0.3652 m({63}) = 0.3652 
m({61,03}) = 0.0340 | m({61,03}) = 0.0340 
m({61}) = 0.0429 m({61}) = 0.0408 
m({62}) = 0.5531 m({62}) = 0.5547 
m({61,02}) = 0.0024 | m({61,02}) = 0.0069 
a=0.3 | m({63}) = 0.3637 m({63}) = 0.375 
m({61,03}) = 0.0168 | m({61,03}) = 0.0135 
m({62,03}) = 0.0212 | m({62,03}) = 0.0089 
m({O}) = 0.0002 
m({61}) = 0.0513 m({61}) = 0.0418 
m({62}) = 0.5409 m({62}) = 0.5341 
a=-08 | ™ {61,02}) = 0.0089 | m({61,02}) = 0.0199 
m({63}) = 0.3747 m({63}) = 0.3780 
m({61,03}) =0.0014 | m({62,03}) = 0.0255 
m({62, 03}) = 0.0228 | m({O}) = 0.0007 
m({61}) = 0.0487 m({61}) = 0.0487 
m({62}) = 0.5435 m({62}) = 0.5435 
a=1 m({61,02}) =0.0124 | m({61,02}) = 0.0124 
m({63}) = 0.3837 m({63}) = 0.3837 
m({62,03}) =0.0118 | m({62,03}) = 0.0118 


Qa “Ty” “Tugo” 
BetP({01}) = 0.0446 | BetP({61}) = 0.0446 
a=0 | BetP({62}) =0.5731 | BetP({62}) = 0.5731 
BetP({03}) = 0.3822 | BetP({63}) = 0.3822 
BetP({01}) = 0.0525 | BetP({0,}) = 0.0511 
a =0.3 | BetP({2}) = 0.5648 | BetP({02}) = 0.5627 
BetP({03}) = 0.3827 | BetP({63}) = 0.3863 
BetP({01}) = 0.0565 | BetP({61}) = 0.0520 
a =0.8 | BetP({62}) = 0.5567 | BetP({62}) = 0.5570 
BetP({03}) = 0.3868 | BetP({63}) = 0.3910 
BetP({01}) = 0.0549 | BetP({6,}) = 0.0549 
a=1 | BetP({2}) = 0.5556 | BetP({62}) = 0.5556 
BetP({03}) = 0.3895 | BetP({63}) = 0.3895 


The BBAs obtained by using “Twa” and “Tyso” are listed 
in the Table X and Table XI, respectively. Then, we combine 
these three BBAs (i.e., ms,, mg, and mg,). The combined 
BBA is represented by m. The combined BBAs and the 
pignistic probabilities are listed in the Table XII and Table 
XIII, respectively. We just list the corresponding BBAs for 
a = 0,0.3,0.8 and 1. 

In the Table XIII, all the classification results are 62 and 
are correct. When a is given from 0 to 1, mg,({02}) and 
mg, ({02}) are decreasing in the Table X and Table XI. With 
the increasing value of a, mg, and mg, are more close to 
Mmazx (e., the BBA obtained by using “Tmax” or the BBA 
obtained when a = 1), which is the reason of the decreasing 
value of BetP({62}) in the Table XIII. 


VI. CONCLUSIONS 


In this paper, we have proposed two approaches with a 
user-specified weighting factor to transform a given FMF into 
a trade-off BBA. These two approaches are both effective 
approaches for obtaining a trade-off BBA. The users can 
transform a given FMF into a BBA by their preferred ap- 
proach. With the cardinality of FOD increasing, the computa- 
tional complexity of the optimization will become exponential 
growth. The reason for this is the structure of the belief 
functions. By using the user-specified weighting factor to 
influence how close the trade-off BBA is to each of the two 
BBAs obtained by solving the uncertainty maximization and 
minimization. The example of using our transformations in 
the practical application is provided. The numerical examples 
indicate that the uncertainty of the obtained BBA is between 
the minimum and maximal uncertainties. In a future work, we 
will try to use and compare different types of the distance 
of evidence as objective function to expect a better trade-off 
BBA. 
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Abstract—Dempster-Shafer evidence theory, also called the 
theory of belief function, is widely used for uncertainty mod- 
eling and reasoning. However, when the size and number of 
focal elements are large, the evidence combination will bring 
a high computational complexity. To address this issue, various 
methods have been proposed including the implementation of 
more efficient combination rules and the simplifications or 
approximations of Basic Belief Assignments (BBAs). In this paper, 
a novel principle for approximating a BBA into a simpler one 
is proposed, which is based on the degree of non-redundancy 
for focal elements. More non-redundant focal elements are kept 
in the approximation while more redundant focal elements in 
the original BBA are removed first. Three types of degree 
of non-redundancy are defined based on three different def- 
initions of focal element distance, respectively. Two different 
implementations of this principle for BBA approximations are 
proposed including a batch and an iterative type. Examples, 
experiments, comparisons and related analyses are provided to 
validate proposed approximation approaches. 


Keywords: belief function, combination rule, BBA approxi- 
mation, focal element redundancy. 


I. INTRODUCTION 


Dempster-Shafer Theory (DST) [1] which is also called 
the theory of belief function, has been widely used in many 
uncertainty modeling and reasoning related application fields 
including information fusion [2], pattern classification [3] and 
Multiple Attributes Decision Making (MADM) [4]. How- 
ever, DST was criticized because of its limitations [5]. One 
limitation is its computational complexity [6] in evidence 
combination, which is influenced by the cardinality of the 
frame of discernment and the number of focal elements in 
BBAs to combine. The high computational cost brings a big 
challenge to the practical use of belief functions. 

To reduce the computational cost encountered in evidence 
combination, many approaches were proposed, which can be 
in general categorized into the following types. The first type is 
to design efficient combination algorithms. The representatives 
of this type include Kennes’ method [7], Barnett’s approach 
[8] and Shafer and Logan’s implementation for hierarchical 
evidence [9]. The second type is to simplify the original 
Basic Belief Assignment (BBA), i.e., to obtain a corresponding 
approximated BBA. Two major types can be found in the 
prevailing BBA approximations: (A) To use a BBA with a 
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simpler and special structure to approximate the original one. 
For example, one can use the Bayesian BBA [10] and the 
consonant approximation of a BBA [11]; (B) To limit the 
quantity or size of focal elements by removing some focal 
elements by following some criteria (focal elements’ size 
or mass value, or both). Tessem’s k — 1 — x method [12], 
Lowrance et al’s summarization approach [13], Bauer’s D1 
approximation [14], Denceux’ inner and outer approximations 
[15], Monte-Carlo approximation [16], etc. are representatives. 
They remove focal elements and redistribute the corresponding 
mass assignment values. In our previous works in recent years, 
a hierarchical proportional redistribution approach,17 rank- 
level based BBA approximation [18], and optimization based 
approximations [19] were proposed. Shou et al. proposed a 
BBA approximation based on the correlation coefficient [20]. 

The work in current paper focuses on reducing the com- 
putational cost of evidence combination with BBA approx- 
imations. As aforementioned, one can limit the number of 
focal elements according to some criteria. Intuitively, the 
rational criterion should relate to the importance or non- 
redundancy of the focal elements. A focal element with more 
“common” or “shared information” with other focal elements 
is more redundant and should be removed first if possible. 
However, the available criterion is either the focal element’s 
size (i.e., cardinality) or its mass assignment, which has no 
direct and logical relation with the focal elements’ importance 
or the non-redundancy. Therefore, criteria related to the focal 
elements’ non-redundancy are required for proposing more 
reliable and efficient BBA approximation approaches. This is 
the motivation of our work in this paper. 

We use the average distance between a given focal element 
and all other focal elements to define the non-redundancy. 
Smaller average distance means that the given focal element 
carries more similar information compared with the others, 
i.e., it is less non-redundant and should be removed earlier. 
Different definitions of the distance between focal elements are 
used in this paper to define different non-redundancies of focal 
elements. Two strategies of removal (including a batch and an 
iterative mode) are proposed in the sequel, followed by the re- 
normalization or redistribution. Numerical examples, simula- 
tions and related analyses are provided to show the rationality 
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and interest of the novel BBA approximation approaches. 

This paper extends our previous ideas briefly introduced in 
[21], where the non-redundancy for focal elements was prelim- 
inarily proposed. Comparatively, more definitions for distance 
of focal element are used to define the non-redundancy of 
focal elements and different distance definitions are analyzed 
and compared in this extended version. We also provide more 
experiments and analyses to provide a precise evaluation of 
these new approximation methods. These are all added val- 
ues. The rest of the paper is organized as follows. Section I 
provides the essentials of DST. Some limitations, especially 
the computational cost, are pointed out. A brief review of the 
available works on BBA approximations is provided in Section 
Ill. Section III then proposes the non-redundancy of focal 
elements based on three different types of distance of focal 
elements. Numerical examples are provided to illustrate and 
compare different definitions of non-redundancy. Simulations 
and related analyses are provided in Section IV to verify and 
evaluate our proposed non-redundancy of focal elements and 
their performance in BBA approximations. Comparisons be- 
tween the new proposed approaches and some typical existing 
ones are also provided. Section V concludes this paper. 


IJ. PRELIMINARIES OF BBA APPROXIMATION 
A. Basics of Dempster-Shafer evidence theory 


In Dempster Shafer evidence theory [1], those elements in 
the Frame of Discernment (FOD) © are mutually exclusive and 
exhaustive. A basic belief assignment (BBA, also called mass 
function) on a FOD is defined by a mapping m : 2° +> [0,1] 
satisfying m(Q) = 0 and 


S> m(A) =1 (1) 
AE2© 
If m(A) > 0, A is a focal element. Two Bodies of Evidence 
(BOEs) can be combined using Dempster’s rule as 


ee 0, for A = , 
TK Laine, =a (Ai)ma(B;), for AF 0. 
(2) 
where K = 4p.—_4™1(A;)me2(B;) is the conflict co- 
efficient oe ihe ae eee mass assignments 
between BOEs to combine. Note that Dempster’s rule is 
both commutative and associative. Dempster’s rule has also 
received serious arguments due to its counter-intuitive be- 
haviors [22]. Various alternative combination rules have been 
proposed. See [23] for more details. These alternatives focus 
on suppressing the counter-intuitive behaviors of Dempster’s 
rule. However, they also have to face the high computational 
cost problem6 with the increase of the FOD’s cardinality and 
that of the focal elements number. 

To reduce the high computational cost caused by the evi- 
dence combination, one can try to design simpler combination 
rules, attempt to develop efficient implementations for prevail- 
ing rules, or try to simplify (approximate) the original BBA by 
a simpler one with less focal elements. In this paper, we focus 
on the BBA approximation, which is deemed more intuitive 
for human beings to catch the meaning [24]. 


B. Brief review of available BBA approximation approaches 


An approximation f(-) of BBA aims to find a simpler BBA 
mg to represent the original BBA m, i.e., ms = f(m). The 
available approaches can be categorized into the following two 
types: using the BBA with a special structure and reducing the 
number of focal elements. 

1) Using BBA with special structure: 


(1) Bayesian BBA approximation 


A Bayesian BBA approximation outputs a Bayesian BBA 
with a special structure where all focal elements are singletons. 
The most representative Bayesian approximation of a BBA 
is the pignistic probability transformation proposed by Smets 
[6] and Kennes [7]. Voorbraak [10] uses the normalization 
of the plausibility for singletons to approximate the original 
BBA. Sudano [25]-[27] proposed a series of Bayesian ap- 
proximations based on the proportion between plausibilities 
or beliefs including the batch mode and the iterative mode. 
Cuzzolin [28] proposed an intersection approximation for BBA 
using the proportional repartition of the total non-specific mass 
assignment for each contribution of the non-specific mass 
assignments involved. Smarandache and Dezert [23] proposed 
a Bayesian BBA approximation in the framework of Dezert- 
Smarandache Theory (DSmT), i.e., the Dezert-Smarandache 
Probability transformation (DSmP), which can also be applied 
in DST model. In our previous work [29], a hierarchical DSmP 
was proposed. More analyses, comparisons and evaluations on 
these Bayesian approximations can be found in [30]. 

Note that the Bayesian approximation is usually used for 
the probabilistic decision but not reducing the computational 
cost in evidence combination, since any Bayesian BBA ap- 
proximation makes too lossy approximations. 

(2) Consonant approximations 
Here, the special structure for an approximated BBA is 
assumed to be consonant support, ie., the available focal 
elements are nested in order. The representative works of the 
consonant approximation include [11], [31]. 


2) Removing focal elements according to some criteria: 


(1) Limiting the maximum allowed cardinality of remain- 
ing focal elements 
In k-additive approximations [32], the maximum cardinality 

of available focal elements is no greater than a predefined 
size k. In [32], the mass assignments of focal elements 
with cardinality larger than k are redistributed to those with 
cardinality no larger than k. Such a redistribution mass assign- 
ments is done according to the proportions designed based 
on the average cardinality. In our previous work [19], such 
a redistribution of mass assignments is implemented via an 
optimization approach. In our another previous work [17], a 
BBA approximation with the hierarchical redistribution was 
proposed. These methods aim to remove the focal elements 
with larger cardinalities since they bring more computational 
cost in the combination in general. 

(2) Limiting the maximum allowed number of remaining 

focal elements 
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In this type of approaches, the number of focal elements 
is reduced by removing some focal elements according to 
some criteria until the predefined quantity of remaining focal 
elements is reached 


(A) 


(B) 


(C) 


k —l—« method [12]: 

A simplified BBA is obtained according to rules: one 
should keep no less than k focal elements; one should 
keep no more than / focal elements; one should delete 
the masses being no greater than 2. 

In the k—1—z method, all focal elements in the original 
BBA are sorted in a descending order based on their 
mass assignment values. Then, choose the first p focal 
elements such that k < p < / and the summation of 
mass values of those first p focal elements is no less 
than 1 — x. The removed mass values are redistributed 
to remaining focal elements (re-normalization). 
Summarization method [13]: 

Summarization method is similar to the classical k —1— 
x, where focal elements with the highest mass values 
are kept. The removed mass values are accumulated 
and assigned to the union set of corresponding focal 
elements. Suppose that / is the number of focal elements 
in the desired simplified BBA mg(-) of an original BBA 
m(-). Let M denotes the collection (or set) of k—1 focal 
elements with the highest mass values. One can obtain 
the simplified BBA according to 


m(A), if Ae M, 
msg(A) = DACA, AEM m(A’‘), if A= Ao, (3) 
0, otherwise. 
where 
Ao = Al (4) 


A’¢M,m(A’)>0 


D1 method [14]: 

Let m(-) be the original BBA and mg(-) denote the 
simplified BBA. The desired number of remaining focal 
elements is k. Let M denote the set including k — 1 
focal elements with the highest mass assignment values 
in m(-), and M~ be the set including all the other 
focal elements of m(-). D1 method aims to keep all the 
members of / and to assign the mass values of those 
focal elements in set 1/~ among the focal elements in 
M. The set re-assignment is implemented as follows. 
For A € M-, find all the supersets of A in M to form 
the set My. If M4 4 0, m(A) will be uniformly re- 
assigned among those focal elements with smallest size 
in M4. When M, = 0, then construct the set Mi: 


Mi, ={Bé M||B|>|A|,BN AZO} 6) 


If M‘, #0, m(A) is assigned among the focal elements 
with smallest size in M‘,. The value assigned to a focal 
element B depends on |B 1M Aj. The above procedure 
will be executed iteratively until all m(A), A © M7 
have been re-assigned to those focal elements in the set 


749 


(D 


Ye 


M. If M‘, = 0 there might be two cases: if 0 « M, 
the summation of mass assignment values of the focal 
elements in M/~ will be added to m(0); if O ¢ M, one 
should set © as a focal element of mg(-) and assign the 
sum of mass assignment values of the focal elements in 
the set IM~ to the simplified BBA mg(O). 

More details on D1 method with examples can be found 
in [14]. 

Joint use of cardinality and mass assignment with rank- 
level fusion: 

In our previous work [18], we jointly use the cardinality 
and the mass values of focal element to design a rank- 
level fusion based BBA approximation approach, which 
is briefly recalled below. 

Step 1. Sort all the focal elements of an original 
BBA (with L focal elements) in an ascending order 
according to the mass assignment values (an underlying 
assumption: the focal element with small mass should 
be deleted first). The rank vector can be obtained as 


TS nll et ex sutm (DL) (6) 


Here 1,,(i) is the rank position of the?-ih focal element 
(«= 1,2,...,L) in the original BBA based on mass 
values. 

Step 2. Sort all focal elements of the original BBA 
in a descending order according to the cardinalities 
(an underlying assumption: the focal element with big 
cardinality should be deleted first). The rank vector can 
be obtained as 


re = [re(1), re(2),--+sPe(L)] (7) 


Here r.(i) denotes the rank position of the i-th focal 
elements in the original BBA based on the focal element 
size. 

Step 3. By using the rank-level fusion (weighted aver- 
age), one can obtain a fused rank vector as 


rp = (rp), 792), --- 792) (8) 


where r¢(t) = aTm(i) + (1 — a)r-(t) and a € [0,1] 
denotes the preference of two different criteria. Such a 
fused rank can be considered as a relatively compre- 
hensive criterion reflecting both the information of mass 
values and cardinality. 

Step 4. Sort ry in an ascending order and find out 
the focal element with the smallest ry value, Le., 
rr¢(j) = minr,(z). Then remove the j-th focal element 
in the original BBA. 

Step 5. Repeat Steps 1-4 until / focal elements are 
left. Renormalize the remaining masses of the k focal 
elements, and output the approximated BBA in the final. 
Step 6. Correlation coefficient based BBA approxima- 
tion (CR-based approximation) 

The correlation coefficient is defined as 
c(m 15 mg) 


(9) 


CRegpa(m1,™m2) = 
c(m1,7™m1)c(m2, m2) 
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where 


2”—12"-1 


c(m1, m2) => > 


i=1 


my (A;)m2(Aj) GS 

j=l 

(10) 

is used for BBA approximation. Suppose that original 

BBA has L focal elements, and the quantity of desired 
remaining focal elements is k. 


i) Remove one focal element A; and reassigned its 
mass value m(A;) to the related remaining focal 
elements according to the redistribution strategy 
based on single- ton relation proposed in Ref. 20 
to generate an approximated BBA mj. For each A, 


(j =1,2,...,) and corresponding mi, calculate 
c(mi, mi) using Eq. (10)). 
ii) Sort the A; (j=1,2,...,L) according to 


c(m;,m';) in an ascending order. Remove L — k 
focal elements with top L —k values of correlation 
coefficient c. 

ili) Reassign the mass values of removed focal ele- 
ments to the remaining focal elements according 
to the redistribution strategy based on singleton 
relation proposed in [20]. Then, one obtain the 
approximated BBA. 


Besides the above BBA approximations with a preset quan- 
tity of remaining focal elements, Denceux’s BBA approxima- 
tions by the outer and inner approximations [15] using distance 
between focal elements also preset such a quantity in the 
approximations. See [15] for details. 


Note that the Monte-Carlo based BBA approximation can 
also be classified into the approximation approaches using the 
strategy of removing focal elements. See [16] for details. 


In this paper, we focus on the BBA approximations through 
presetting the quantity of remaining focal elements. As afore- 
mentioned, existing BBA approximations of this type proposed 
to remove some focal elements that have smaller mass assign- 
ment values, larger cardinalities, or both. Although they have 
some rational justifications, it is quite dangerous (or risky) 
to remove those focal elements with small mass values or 
larger sizes. It may also be unconvincing to remove those 
focal elements with large cardinality justified only by their 
bringing possible high computational cost to the combination. 
Therefore, one should be prudent when using a technique 
of BBA approximation. It is more convincing to remove 
those “unimportant” focal elements. The very redundant focal 
elements can reasonably be considered as “unimportant” (carry 
duplicate information) and the relatively non-redundant focal 
elements can reasonably be deemed as important; therefore, 
we propose to define the degree of non-redundancy for a focal 
element at first. From this degree of non-redundancy, we can 
then develop new BBA approximation methods by removing 
focal elements according to the degree of non-redundancy, 
and intuitively, the loss of information in terms of distance 
of evidence might be smaller. 


III. BBA APPROXIMATIONS BASED ON NON-REDUNDANCY 
OF FOCAL ELEMENTS 


In this section, we define the degree of non-redundancy for 
focal elements based on the distance of focal elements first. 
Then, we design BBA approximations based on the degree of 
non-redundancy. 


A. Non-redundancy of focal elements 


Suppose that a BBA m(-) has 1 > 2 focal elements. If a 
focal element A; has the largest average distance with other 
focal ele- ments A; C O (j ¥ 7“), then A; shares the least 
common information with other focal elements in the BBA 
m/(-), iLe., Aj is the most non-redundant one. Therefore, one 
can define the degree of non-redundancy using the average 
focal distance between a focal element and the others. Suppose 
that d’'(A;, A;) is the distance between two focal elements A; 
and Aj. First, we can compute the distance matrix for all focal 
elements in BBA m(-) as 


d®(Ai,A1) d*(A1, Az) d® (Ai, At) 

d™(Az,A1) d* (As, Az) d™ (As, Al) 
Matre = : 7 : (11) 

d®(A;,A1) d® (Aj, Az) d® (Aj, Ar) 


Since d* is a distance, at least there should exist 
d® (Ai, A;) = 0 and d¥ (Aj, A;) = d® (Aj, Aj) where 
i=1,2,...,1. That is, the matrix Matpg is symmetric. There- 
fore, it is not necessary to compute all elements in Matgr. 

For focal element A;, we can then define its degree of non- 
redundancy as 

1 


I-1 
nRd(A;) 4 aq a (Ai As) (12) 
j=l 


When nRd(A;) is larger, A; has a larger non-redundancy 
(less redundancy); when nRd(A;) takes a smaller value, A; has 
a less non-redundancy (larger redundancy). Then, the problem 
is how to describe the distance between focal elements. To 
be more strictly, the “distance” used here should be “dis- 
similarity”, since the distance metric should satisfy all the 
four requirements including non-degeneracy, symmetry, non- 
negativity, and the triangular inequality. When there is no 
confusion raised, we still use the distance in the sequel. 


B. Distance between focal elements 


In general, the distance between two focal elements should 
use the two aspects of information in focal elements including 
the mass assignment and focal element (set) as 


d® (Ai, Aj) = f(m(Ai), Ai, (Ay), Ay) (13) 


The available distances between focal elements are intro- 
duced below. 
(1) Erkmen’s distance: 
Erkmen and Stephanou [33] proposed a distance (denoted by 
ae here) between focal elements as 
|A; U Aj| 
~ TA; Ay| 


dig (Ai, Aj) ae 


[m(A;) — m(Aj)] logs 


(14) 
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This definition is far from robustness and can bring counter- 
intuitive results as shown in the following cases. 

Case I: if A;NA;=0, ie, |A;NA;|=0, then 
di;(A;,Aj;) =0 cannot be calculated (due to a division 
by zero). One can also say that it tends to infinity; however, 
this is not reasonable since in this case the value of distance 
is dominated by the relationship between focal elements 
(sets). 

Case II: if m/(A;) = m(Aj;), then dE (Aj, A;) = 0. This 
is also counter-intuitive, because the distance value is totally 
dominated by mass assignments. That is to say, two differ- 
ent focal elements with the same mass value is deemed as 
identical. Therefore, Erkmen’s definition is not appropriate for 
designing the focal element redundancy. 

(2) Denceux’s union distance: 


Denceux [15] proposed a union-operation based distance as 


du(Ai, Aj) = [m(Ai) + m( 


—m/( 


Aj)\|Ai U Ay| 
A;)|Ai| — m( 


(3) Denceux’s intersection distance: 


Aj)|Aj|_ (15) 


Denceux [15] also proposed an intersection-operation based 
distance as 


bn (Ai, Aj) = m(A;)| Ail + m( 


— [m(Ai) + m( 


Aj)|Aj| 


A;)||JAiN A;| (16) 


Actually, both dy and 6, can be considered as a weighted 
sum of the Hamming distance [15]. It is not difficult to 
verify that both 6, and 6, have no counter-intuitive results 
for aforementioned Cases I and II. Therefore, we choose dy 
and 6, to define the degrees of non-redundancy for the focal 
element. Here we give further analyses on the two distance 
definitions dy and 6p. 


C. Analyses on 6, and bq 


Suppose that m(-) is a BBA defined on the FOD © where 
|O| = n. To simplify the analysis, we assume that m/(-) only 
has two focal elements A; and Aj with mass assignments 
m(Aj,) = a and m(A2) = 1—a. The behaviors of dy and 5, 
are analyzed under different situations. 

1) Focal elements’ relation: Ay C Ag. 

In such a case, for dq one gets 


6n(A1, Az) = m(A1)|Ai| + m(A2)|Ad| 
— [m(A1) + m(A2)||A1 9 Ap! 
= m(A})|Ai| + m(Az)|Aa| 
— [m(A1) + m(A2)}|A1| 
= (1 — m(A1))(|A2| — |A1]) (17) 


As shown in Eq. (17), if m(Az) is fixed, 6, becomes larger 
with the enlargement of the difference between focal elements’ 
cardinalities |A2| — |Ai|. This makes sense. If the difference of 
cardinalities i.e., |A2| — |A1| is fixed, 6, becomes smaller with 
the increase of mass assignment of A; (which is contained by 


Ag). 
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For 6n, one gets 


6u(A1, Az) = [m(A1) + m(Az)]|Ai U Ag! 
— m(Aj;)|Ai| — m(A2)|Ad| 
= [m(A1) + m(A2)]| Aa] 
— m(Aj;)|Ai| — m(A2)|Ap| 
= m(A1))(|A2| — |A1]) (18) 


As shown in Eq. (18), if m(A) is fixed, d, becomes larger 
with the enlargement of the difference between focal elements’ 
cardinalities |A | — |Ai|. This makes sense. If the difference 
of cardinalities i.e., |—|Aj| is fixed, dy becomes lager 
with the increase of mass assignment of A, (contained by 
Ag). That is, when A; C Ag and |A9| — |Aj| are fixed, dy is 
positively correlated to the mass of focal element with smaller 
cardinality (A;), while dU is positively correlated to the mass 
of focal element with larger cardinality (A2). 


The analyses above can be supported by Example | below. 


Example 1. (Focal elements are nested) Suppose that the FOD 
is O = {0),02,...,05}. Four BBAs are defined on 0, and 
each has two focal elements as listed in Table I. 


Table I 
FOUR BBAS IN EXAMPLE 1. 


ie 


{01, 92, 03} 
{91, 02,03, 04} 


For each BBA, the mass value of A; changes from 0.01 to 
0.95 with an increase of 0.01 at each step. The values of 5, 
and dy are shown in Fig. 1. 

As shown in Fig. 1, du is positively correlated to the mass 
of focal element with smaller cardinality (A,) while 6, is 
positively correlated to the mass of focal element with larger 
cardinality (Az). Given a fixed m(Aj), with the increase of 
cardinality of Aj, i.e., the decrease of |A2|— |Aj|, both 5, 
and 6, become smaller. 


2) Focal elements’ relation: Ay Q Ag = 0). 


When A, /M Az = 9, one gets |A; M Ag| = 0 and 
bn(Ai, Az) = m(A1)|Ai| + m(A2)| Aa] 
— [m(A1) + m(A2)||A1 9 Ag| 
= m(A,)|Ai| + m(A2)|Aa| 
= m(Aj)(|Ai| — |A2|) + [Aa (19) 
For du, one gets 
bu(A1, Az) = [m(A1) + m(A2)]| At U A| 
m(A)|A1| — m(A2)|A2| 
= [m(A1) + m(A2)] (| Ai] + |Aa| 
m(A1)|A1| — m(A2)|A2| 
= m(Aj))|A2| — m(A2)|Ai| 
= m(A1)(|A2| — |Ai|) + |A1| (20) 
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Figure 1. Two distances in Example 1| 


It can be seen that when A; M Ag = O if |Aj| is closer to 
|Ao|, both 6, and dy are smaller (it means that {0} is farther 
from {62,63} than from {03}, which makes some sense). This 
can be shown in Example 2 below. 


Example 2. (focal elements have no intersect) Suppose that 
the FOD is 0 = {6),02,...,05}. Four BBAs are defined on 
0, and each has two focal elements as listed in Table II. 


Table II 
FOUR BBAS IN EXAMPLE 2. 


1 2 
{2,03} 


{02, 43, 04} 
{92,93, 04, 5} 


In each BBA, the two focal elements have an empty inter- 
section. For each BBA, the mass assignment of A; changes 
from 0.01 to 0.95 with an increase of 0.01 at each step. The 
values of 6, and dy are shown in Fig. 2. 

As shown in Fig. 2, when |A;| = |Ag| and |Aj| is fixed, 
both 5, and du remain unchanged. Given a fixed m(Aj), 
when the difference |A2| — |Ai| becomes larger, both J, and 
du become larger. When the difference |A2| — |Aj| is fixed, 
du is positively correlated to the mass of focal element with 
smaller cardinality (A;), while 6, is positively correlated to 
the mass of the focal element with larger cardinality (Ag), 
Le., negatively correlated to the mass of the focal element 
with smaller cardinality. 
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E 514 
a 29 A 12 
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Figure 2. Two distances in Example 2 


3)Focal elements’ relation: Ay Az # 9. 

Here A, % Az # Q. Furthermore, A, cannot be contained 
by Ag, and Az cannot be contained by A,. We provide an 
example to show 6, and dy behaviors in this situation. 


Example 3. (focal elements have no intersect) Suppose that 
the FOD is 0 = {61,02,...,96}. Four BBAs are defined on 
©, and each has two focal elements as listed in Table II. 


Table II 
FOUR BBAS IN EXAMPLE 3. 


{41,62} 
{91,92} 


{02, 43, 04, 95} 
{02,63, 04, 95,96} 


For each BBA, the mass assignment of A; changes from 
0.01 to 0.95 with an increase of 0.01 at each step. The values 
of dm and 6y are shown in Fig. 3. 

As we see in Fig. 3, when |Ai| = |Ag|, 64 and du 
equal 1, and they remain unchanged. This is because when 
|A;| = | Ag| = 2 one has du(A1, Ag) = |Ay U Ap| = |Ap| 
and dn(Ai, A2) = |Ap| = |Ay al Apd|. So, bn(Ai, A2) = 1 and 
bu(A1, Ag) = 1. 

Given a fixed m(Aj), when the difference |A2| — |A1| 
becomes larger, both 6, and dy become larger as shown in 
Fig. 3. This makes sense, because the uncommon part of Aj 
and Az becomes large. When the difference |A2| — |Aj| is 
fixed, d is positively correlated to the mass of focal element 
with A, having a smaller cardinality, while 6, is positively 
correlated to the mass of the focal element Az having a larger 
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Figure 3. Two distances in Example 3 


cardinality, i.e., negatively correlated to the mass of the focal 
element with smaller cardinality. 


D. Implementation of BBA approximation using degree of 
redundancy for focal elements 


Based on the degree of non-redundancy in Eq. (12), new 
BBA approximation methods are proposed in this paper, where 
the more non-redundant focal elements are kept and the more 
redundant ones will be removed earlier. 


1) Batch-mode approximation method: Given an original 
BBA m/(-) with 1 focal elements, in the approximation, we 
want to keep k < / focal elements. The batch-mode means 
that the focal elements quantity is reduced from / to / in one 
run as follows. 

Step 1. Compute the matrix Matpg first, and then for each A; 
(i = 1,2,...,1) compute its non-redundancy value nRd(A;). 

Step 2. Sort all nRd(A;) @ = 1,2,...,0) in a descending 
order. 

Step 3. Remove the focal elements with ranking positions of 
bottom | — k. 

Step 4. Normalize the mass assignments of the kept k focal 
elements and obtain the approximated BBA m84(-). 


2) Iterative-mode approximation method: Here, we propose 
to remove iteratively the most redundant focal element (with 
the least nRd value) in each step until k focal elements are 
kept. This method consists of the following steps: 

Step 1. Compute the matrix Matpg and the nRd values for 
each focal element A; (i = 1,2,...,/). 

Step 2. Sort all nRd(A;) G@ = 1,2,...,1) in a descending 
order. 

Step 3. Remove the bottom focal element A,.. 


Step 4 .If the quantity of the kept focal elements is larger 
than k, re-compute nRd(A;) of the kept focal elements where 
i#r and go back to Step 3. Otherwise, switch to Step 5. 
Step 5. Normalize the mass assignments of the kept k focal 
elements and obtain the approximated BBA m'&4(-). 


In the iterative-mode, the matrix and degrees of non- 
redundancy are re-computed in each step after removing a 
focal element in the precedent step. That is to say, only the 
non-redundancy values of the current remaining focal elements 
are involved in each step. 


E. Illustrative examples 


Illustrative examples for presenting the procedure of our 
proposed non-redundancy degree based BBA approximation 
approaches are provided here. The specific calculation steps 
of other major BBA approximation approaches with presetting 
the number of focal elements are also provided here for 
comparisons. 


Example 4. Let us consider a BBA m<(-) defined on 
© = {61,02,...,05} as listed in Table IV. 


Table IV 
FOCAL ELEMENTS AND MASS ASSIGNMENTS. 


{41, 63, Aa} 


{63} 
{03, 04} 
As = {64,65} 


(1) Using k —1— a [12]: 


Parameters k& and / are both set to 3, and x = 0.1. Focal 
elements Ay = {63,64} and As = {64,05} are removed. The 
kept total mass value is 1 — 0.05 — 0.05 = 0.9; therefore, 
the constraint of x is not violated. All the remaining focal 
elements’ mass assignments are divided by 0.9 for the nor- 
malization. The approximated BBA me * (-) obtained using 
k; —l—« is shown in Table V. Here A‘ (¢ = 1, 2,3) are focal 


elements! in m&—'~"(-). 


Table V 
(-) USING k — | — & FOR EXAMPLE 4. 


At = {A1, 2} 


Al, = {61, 03, 04} 
As = {93} 


(2) Using summarization [13]: 


Parameter /; = 3. By using the summarization method, one 
removes focal elements Az = {63}, Ay = {03,04} and 


'Tn the tables VI-XI, table XIII, and tables XV—XVIII we also denote Al 
the focal elements of approximate BBAs obtained by the different methods. 
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As = {04,05}. Their union {63,04,05} is set as a new The outer approximation is similar to the inner approxima- 
focal element whose mass assignment is 0.2, since m({03})+ tion except that the distance used is dy. The approximated 
m({O3,44}) + m({O4,45}) = 0.2. The approximated BBA BBA mQ"*"(-) is shown in Table IX. 

m™(-) is as shown in Table VI. 


Table IX 
Table VI mQUTER (.) USING OUTER APPROXIMATION FOR EXAMPLE 4. 


mS" (.) USING SUMMARIZATION FOR EXAMPLE 4. 
Ay, = {61,62} 0.50 
A = {01,02} 


Al = {01, 03, 04} 0.45 
AS = {01, 03, Oa} Al, = {64,05} 0.05 


AS, = {03, 04, 05} 


(5) Using rank-level fusion based method [18] 
(3) Using D1 method [14] 

The rank of focal elements in m/(-) according to the mass 
assignments is [1,2,3,4,4] (in a descending order). Here 
(1,2,3,4,4] means that A, takes the Ist place; Ag takes the 
2nd place; A3 takes the 3rd place; and A, and As both take 
the 4th place due to their equal mass values. 

The rank of focal elements according to their cardinalities 
in ascending order is [2,3,1,2,2]. Here we set w = 0.5, and 


approximated BBA m®*"*(.) is shown in Table X. 


The parameter /; is still set to 3 here. When we use D1 
method, focal elements A; and Ag belong to M and As, Aq 
and A; belong to M~. The focal element A; = {61,62} has 
no intersection with those focal elements in MW; therefore, its 
value remains unchanged. In M, Az is the unique superset 
of Ag and Ay, so, m(A3) + m(A4) = 0.10 + 0.05 = 0.15 
is added to Ay’s original mass assignment. Az covers half of 
As, so m(As)/2 = 0.025 is further added to the mass of Apo. 


Finally, the rest mass is assigned to 0. The approximated BBA Table X 


D1 . . 
mg (-) is as shown in Table VII. mBANK (.) USING RANK-LEVEL FUSION BASED APPROXIMATION FOR 
EXAMPLE 4. 
aegis Focal element 
mg (-) USING D1 FOR EXAMPLE 4. 


At = {01, 02} 


Ay = {03, 04} 
AS = {61, 63, 64} 


Focal element 


At, = {01,00} 0.500 


Al = {01, 03, 04} 0.475 
A,=9 0.025 


(6) Using CR-based approximation 


Using the CR-based approximation, the correlation coeffi- 


4) Using Denceux’s inner and outer approximations [15 : 
(4) g PP [15] cient values are 


Since this method uses the focal element distance definition 


in Eq. (14), here we also use it for comparison. When using c(m, m4) = 0.7096 
the inner approximation [15], the focal elements pair with the e(m, my) = 0.9462 
smallest distance is removed, and their intersection is consid- e(m, m3) = 0.9912, 
ered as a supplemented focal element. Its mass value is the c(m, m',) = 0.9462, 
summation of two removed focal elements’ mass assignments. ; 

Such a procedure is repeated until the preset focal elements e(m, ms) = 0.9975 


quantity is reached. The approximated BBA mi3™"(-) is shown 


Then, remove Az and As, since they have the top two 
in Table VIII. 


correlation coefficient values. After the redistribution, the 
approximated BBA mS(-) is shown in Table XI. 


Table VII 
misnrr (.) USING INNER APPROXIMATION FOR EXAMPLE 4. 
Table XI 
mR (-) USING RANK-LEVEL FUSION BASED APPROXIMATION FOR 
EXAMPLE 4. 


Al, = {01,02} 0.50 


Al = {01,63, 04} 0.30 
A, =0 0.20 


At = {A1, 2} 


Ay = {63} 
AS = {01, 03, 04} 


As one sees in Table VIII, the empty set is generated as a 
focal element, which is not allowed in the classical DST under 
the closed-world assumption. 
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(7) Using the non-redundancy based batch-mode approxima- 
tion 


We want to keep three focal elements, i.e., & = 3. Calculate 
the distance matrix Matpg as 


A, Ag Ag Ag As 


Ai! O 1.10 1.10 1.10 1.10 
Aog}1.10 0 0.60 0.30 0.65 
Matre = A3/1.10 0.60 0 0.05 0.20 
Ag|1.10 0.30 0.05 0 0.10 
As|1.10 0.65 0.20 0.10 0 


Using this matrix, the degree of non-redundancy for all focal 
elements of m/(-) are obtained as listed in Table XII. 


Table XII 
NON-REDUNDANCY FOR DIFFERENT FOCAL ELEMENTS. 


ARIA, 
A ; 
{01,43, 04} 
0 


{03} 
{03,04} 
As = {04,65} 


Since A3 and A, have the two smallest nRd values, they 
are two focal elements with the lowest non-redundancy (the 
highest redundancy). So, they’d better be removed first and 
their mass assignments are redistributed with the classical 
normalization step. The approximated BBA mB®4(.) is listed 
in Table XIII. 


Table XIII 
mBR0 (.) USING BATCH APPROXIMATION BASED ON REDUNDANCY FOR 
EXAMPLE 4. 


At = {91, 2} 


Al, = {01, 03, 04} 
AS = {04, 05} 


(8) Using the redundancy-based iterative approximation 


Here k = 3, and then two focal elements should be removed. 
In the iterative mode, we only remove one focal element in 
each step. Therefore, two steps are required in this example. 

In Step 1, we obtain the same degrees of non-redundancy 
as listed in Table XI. Then, Ay is removed. 

In Step 2, nRd for A; (¢ = 1,2,...,5, i 4 4) is recalculated 
according to 

5 


nRd(A;j)= S> d®(Aj, As) 
Hl A454 
The results are 
nRd(A;) = 1.1000, 
nRd(A2) = 0.7833, 
nRd(A3) = 0.6333, 
nRd(A;) = 0.6500. 


Then, A3 is removed due to its smallest nRd value (i.e., the 
biggest redundancy among those remaining focal elements). 
In this example, the BBA m'84(-) obtained is the same as 
mBR4(.) listed in Table XII. Note that the batch-mode and 
the iterative approximations do not always obtain the same 
results as illustrated in Example 5. 


Example 5. Assume that the FOD is 0 = {61, 02,03}. An 
original BBA m(-) is listed in Table XIV, and the quantity of 
remaining focal elements is set to k = 3. 


Table XIV 
FOCAL ELEMENTS AND MASS VALUES. 


A 


e Using 6, we can obtain the distance matrix: 
Aj Ag A3 Ag As 


Ay 0 0.4257 0.1780 0.5319 0.1662 
Ap | 0.4257 0 0.2477 0.2477 0.1662 
Matrg = A3| 0.1780 0.2477 0 0.4081 0.3324 
Ag} 0.5319 0.2477 0.4081 0 0.3324 
As | 0.1662 0.1662 0.3324 0.3324 0 


All focal elements’ degrees of non-redundancy are 
nRd(A;) = 0.3255, 
nRd(A2) = 0.2718, 
nRd(A3) = 0.2915, 
nRd(A,) = 0.3800, 
nRd(A;) = 0.2493. 


Using the batch mode method, focal elements Az and A; are 
removed. The approximated BBA is shown in Table XV. 


Table XV 
mBRD (-) USING BATCH APPROXIMATION BASED ON REDUNDANCY WITH 
THE DISTANCE 6p. 


Focal element 


Aj, = {61,02} 


Ay = {02} 
A = {93} 


By using the iterative mode method, the degrees of non- 
redundancy obtained in Step | are 


nRd'(A;) = 0.3255, 
nRd'( Az) = 0.2718, 
nRd'(A3) = 0.2915, 
nRd'(A4) = 0.3800, 
nRd'(A5) = 0.2493. 
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Then, we first remove the focal element A; since nRd'(A;5) is 
the least one. Then recalculate nRd values for remaining focal 
elements A,, Az, A3 and Ay: 


nRd''(A,) = 0.3785, 
nRd" (Az) = 0.3070, 
nRd''(A3) = 0.2779, 
nRd"(A4) = 0.3959. 


In Step 2, nRd"(A3) is the least one, therefore A3 is 
removed. After normalization, we obtain the BBA m/84(-) with 
iterative approximation as shown in Table XVI. 


Table XVI 
mk (.) USING ITERATIVE APPROXIMATION BASED ON REDUNDANCY 
WITH THE DISTANCE én é 


A’, = {01,62} 0.2959 
0.4117 
0.2924 


Ay = {02,03} 
As = {63} 


e Using du, the distance matrix is 


Ay Ag Ag Ag As 


Ay 0 0.4257 0.2322 0.5298 0.1780 
Ap | 0.4257 0 0.2322 0.1759 0.2477 
Matre = A3| 0.2322 0.2322 0 0.4081 0.4644 
Ag} 0.5298 0.1759 0.4081 0 0.3518 
As | 0.1780 0.2477 0.4644 0.3518 0 


All focal elements’ degrees of non-redundancy are 


nRd(Aj) = 0.3414, 
nRd(A2) = 0.2705, 
nRd(A3) = 0.3342, 
nRd(A4) = 0.3664, 
nRd(As) = 0.3105. 


By using the batch mode method, the focal elements Aj and 
As are removed. After applying the normalization, we obtain 
the approximated BBA as shown in Table XVII. 


Table XVII 
mBRD (-) USING BATCH APPROXIMATION BASED ON REDUNDANCY WITH 
THE DISTANCE oy. 


At = {A1, 02} 


AL = {02} 
A = {03} 


Using the iterative mode method, of non- 


redundancy obtained in Step | are 
nRd'(A;) = 0.3414, 
nRd'(Az) = 0.2705, 
nRd'(A3) = 0.3342, 
nRd'(A4) = 0.3664, 
nRd'(A5) = 0.3105. 


degrees 


The focal element A» is removed first, since it has the smallest 
nRd value. Then recalculate all nRd values for remaining focal 
elements A,, A3, Ag and As: 


nRd''(A;) = 0.3133, 
nRd" (A) = 0.3682, 
nRd''(A3) = 0.4299, 
nRd''(A4) = 0.3314. 
In Step 2, the focal element A; is removed, since nRd(A1) is 


the smallest one. After normalization, we can obtain the BBA 
miS4(.) as shown in Table XVIII. 


Table XVIII 
mR? (-) USING ITERATIVE APPROXIMATION BASED ON REDUNDANCY 
WITH THE DISTANCE Ou. 


Focal element 


As we can see in Example 5, the results of the batch 
mode and iterative mode approximations are different. In 
the next section, we provide experiments and simulations to 
evaluate our proposed BBA approximation approaches and 
those available ones. 


TV. SIMULATIONS FOR EVALUATION 


We use the computational cost caused by the evidence 
combination and the closeness between the approximated BBA 
and the original one in average to evaluate the performance 
of approximations. An approximation with less computational 
cost and larger closeness is desirable. To describe the closeness 
between BBAs, we use a strict distance of evidence, which is 
Jousselme’s distance (dz) [34]. One can also use other types 
of strict distance in evidence theory e.g., belief interval based 
distance of evidence [35]. 

Suppose that ™, m2 are two BBAs defined on 0, with 
|O| = n. If my and mz are considered as two vectors denoted 
by m, and mz, respectively, Jousselme’s distance of evidence 
is defined as 


A 


dy(my,,mz) 0.5(m, — mz2)'Jae (m;— mz), (21) 


where Jac is the so-called Jaccard’s weighting matrix whose 
elements J;; = Jac(A;,B;) are defined by 
_ Ain B;| 


Jac(A;, B;) = ]A, U B,] 
a J 


(22) 
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It is a most widely used distance of evidence, and it has 
been proven to be a strict distance metric [36]. 

Our simulation is based on a Monte Carlo simulation 
using M = 200 random runs. In j-th simulation run, the 
original BBA to approximate m/(-) is randomly generated 
and the different approximation results {m‘ (-)} are obtained 
using the different approximations, where 7 denotes the i- 
th approximation approach. Here we use @ to denote the 
evidence combination. We calculate the computational time 
of the original evidence combination of m/(-) 6 m4(-) with 
Dempster’s rule, and the computation time of Dempster’s 
combination of each approximated BBA mi, (emi, (-). Here 
we compare our proposed approaches with / — 1 — x method 
(S;), D1 method ($2), Summarization method (53), Denceux’s 
outer approximation (54), the rank-level fusion based ap- 
proximation (55), and our new degree of non-redundancy 
based approximations including the batch mode with 6, (S¢), 
iterative mode with 6, ($7), batch mode with dy (Sg), iterative 
mode with dy (S9), and CR-based approximation ($9) since 
all these methods can set the quantity of the remaining focal 
elements, and they never consider the empty set as a valid focal 
element (contrarily to inner approximation which will bring 
troubles for making the comparisons because Jousselme’s 
distance cannot be computed if one allows to put positive mass 
on empty set because || = 0). 

In our simulations, the cardinality of the FOD O is 4. In 
each random generation, there are 24— 1 = 15 focal elements 
in the original BBA. The number of remaining focal elements 
for all the approaches used here is set to from 14 down to 2. 
We randomly generate BBA using Algorithm 1 [37] in Table 
XIX below. 


Table XIX 
ALGORITHM |: RANDOM GENERATION OF BBA. 


Random generation of BBA 
Input: ©: Frame of discernment; 
Nmax: Maximum number of focal elements 

Output: m BBA 
Generate P(O), which is the power set of ©; 
Generate a random permutation of P(O) + R(O); 
Generate an integer between | and Nmax — l. 
FOR each: First k elements of R(©) do 

Generate a value within [0,1] + m;(-), @ = 1,2,...,0); 
END 
Normalize the vector m = [m1,mo,... 
m(A;) = mi, 


,mi] =m; 


The average (over 200 runs) combination time and average 
(over 200 runs) distance values (dz) between the original 
BBA and the approximated BBA’s obtained using different 
approaches given different remaining focal elements’ numbers 
are shown in Figs. 4 and 5, respectively. 

The average (over all runs and all numbers of remaining 
focal elements) computation time and distance values are 
shown in Table XX. 

Note that the computer for the experiments is with i7- 
8550CPU, 16 GB LPDDRHI RAM, WINDOWS 10 OS and 
MATLAB 2013B. 
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Figure 4. Comparisons between different approximations in terms of com- 
putation time. 


15 13 11 9 7 5 3 1 


Number of remaining focal elements 


Figure 5. Comparisons between different approximations in terms of dz. 


Table XX 
COMPARISONS BETWEEN DIFFERENT BBA APPROXIMATIONS IN TERMS 
OF COMBINATION TIME AND CLOSENESS. 


Original BBA ; 0 
k—l—«az (S1) 

D1 (S2) 

Sum ($3) 

Outer (S'4) 


Rank-level (S'5) 
Batch dn (S6) 
Iterative dq ($7) 
Batch du (Sg) 
Iterative du (S9) 
CR-based (S10) 
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It can be shown from Table XX, Figs. 4 and 5 that for 
all the approximations compared here including our proposed 
four types of approximations based on degree of redundancy 
for focal elements, the computational time are significantly 
reduced when compared with original computation time. At 
the same time, our focal element redundancy based approxima- 
tions have smaller distance (less loss of information) according 
to all the distances of evidence used here. In our four new 
approximations, the iterative mode with 6, performs the best. 

Here we also provide the comparisons of computational cost 
of different approximation approaches themselves. To obtain 
the approximation computation time, in each run for different 
approximation approaches, the average time of approximation 
with remaining focal elements numbers from 2 to 14 is 
calculated. Then, each approximation approach’s averaging 
computation time over 200 runs is listed in Table XXI. The 
computational complexity of each approach listed is also listed 
in Table XXI. 


Table XXI 
COMPARISON BETWEEN DIFFERENT BBA APPROXIMATIONS IN TERMS OF 
COMBINATION COMPLEXITY. 


; 0.00075 O(n 
0.00052 
0.00056 
0.00150 
0.00079 
0.00110 
0.00540 
0.00100 
0.00500 
1.78150 


Outer (S14) 
Rank-level (55) 


? +n? logn) 
O(nlogn) 
O(n? + n? log n) 

O((n — k)(n? + n? log n)) 
O(n? + n? log n) 
O((n — k)(n? + n? log n)) 
O((2n)"—*) 


Batch dn (S6) 
Iterative dq (S7) 
Batch du (Sg) 
Iterative dU (S9) 
CR-based (S10) 


As shown in Fig. 5, the approximated BBA obtained using 
CR-based method can have smaller distance to the original 
BBA when the number of remaining focal elements are not 
so small (from 14 down to 9). However, it is at the price of 
computational cost. Its computation time is about 10? times 
of other approaches compared. 

CR-based method use a way like the traversal when se- 
lecting the focal elements to remove. Actually, it is not a 
real traversal, since it removes the L — k focal elements in 
a batch, but not one by one. Therefore, when the remaining 
focal elements number is small, its distance becomes not so 
small. 

Comparatively, according to the experimental results, our 
proposed approximation approach can achieve smaller distance 
and at the same time, its time cost is accepted. 

Note that with the improvement of the computer’s comput- 
ing capability, the importance of the mass function approx- 
imation will be decreased. However, there still exists some 
resource-restricted environment or platforms, for example, the 
embedded system for real time tasks, where the computational 
resource including the CPU and the RAM are not so adequate 
and the approximation, which can save computational time, is 
still important. 


On the other hand, the BBA approximation could be 
considered as a preprocessing of “data”, which can reduce 
the computational cost. Even if the computational resource 
is enough, to further reduce the computational cost is still 
desirable, especially for those real-time applications. 

Note that our current performance evaluation on different 
approximation approaches is based on the experimental results 
in terms of the statistical averaging combination computational 
time, and the distance between the approximated BBA and 
the original one. This makes sense from the engineering or 
application viewpoints. To comprehensively evaluate different 
approximation approaches, theoretical analysis and proof are 
needed, which is also one of the research focuses in our work 
in the future. 


V. CONCLUSION 


Novel methods for BBA approximations are proposed in 
this paper, where the most redundant focal elements are 
removed at first. The degree of non-redundancy is defined 
based on dis- tance between focal elements. Batch and iterative 
implementations of the BBA approximations are provided. It 
is experimentally shown that our new BBA approximations 
can reduce the computational cost of evidence combination 
with less loss of information, which is described by the 
distance of evidence. At the same time, the computation time 
of approximations in our proposed approaches is acceptable. 

In our future work, we will focus on designing more com- 
prehensive and rational distance of focal elements, based on 
which, the degree of focal elements can be calculated. In 
fact, the non-redundancy represents a type of “importance” 
for focal elements. We will also try to define some new type 
of “importance”, based on which the removal of focal elements 
can be done more rationally executed. As shown in this paper, 
we evaluate the performance of different BBA approximations 
using the computation time and the distance of evidence. In 
future work, we will also explore more comprehensive evalua- 
tion criteria and theoretical evaluation in mathematics for the 
BBA approximation approaches. This is crucial for the design 
of more effective BBA approximations. 

When we use some criterion (e.g., the non-redundancy 
proposed in this paper) to determine those “unimportant” 
or “redundant” focal elements, we can combine these focal 
elements to a new one (with intersection or union operation 
of these ele- ments) besides removing them. For example, we 
can combine the most two redundant focal elements to a new 
focal element by using the operations like intersection, union 
and other ways to replace the current the removal of redundant 
focal elements. Furthermore, we can use the method like PCA 
in the design of BBA approximations for the combination of 
focal elements to expect a better approximation performance 
in the future research work. 
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Abstract—The theory of belief functions is an important tool 
in the field of information fusion. However, the fusion of Basic 
Belief Assignments (BBAs) requires high computational cost and 
long computing time when a large number of focal elements are 
involved in the fusion rules. This problem becomes a bottleneck 
of application of Belief Functions (BF) in high-dimensional real 
problems. To overcome this drawback, many approaches were 
proposed to approximate BBAs to reduce the computational 
complexity in the fusion process. In this paper, we present a 
novel method based on the compatibility of focal elements to 
approximate a BBA by removing some focal elements of the 
original BBA. Besides, a new mass assignment strategy based 
on the distance of focal elements is proposed. Several examples, 
simulations and related analyses are provided to illustrate the 
interest and efficiency of the proposed method. 

Keywords—Information fusion, Belief functions, Basic belief 
assignment, Approximation 


I. INTRODUCTION 


The evidence theory was proposed by Dempster in the study 
of multivalued mapping in 1967 [1] and later promoted by 
Shafer in 1976 [2] with the introduction of Belief Functions 
(BF). The theory of belief functions is named also Dempster- 
Shafer Theory (DST) in the literature. Belief Functions provide 
an effective method for dealing with the expression and 
synthesis of uncertain information and they have been widely 
used in many fields such as image processing [3, 4], target 
tracking [5], and fault diagnosis [6, 7]. 

However, the evidence combination will encounter high 
computational cost when the frame of discernment (FoD) is 
large. To overcome this drawback, one effective approach to 
reduce the computational complexity is the BBA approxi- 
mation. The BBA approximation aims to obtain a simpler 
BBA by removing some focal elements according to different 
simplification criteria. In existing works, the simplification 
criteria can be divided into the following three categories: 


1) Simplification based on the mass assignment of a 
focal element. The focal elements with smaller mass 
assignments are deemed unimportant, which should be 
removed firstly. & — 1 — x [8], Summarization [9] and 
D1 [10] are representatives of this criterion. 

2) Simplification based on the cardinality of a focal 
element. The focal elements with larger cardinalities 
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may cause more computational cost. k—additive ap- 
proach [11] and hierarchical proportional redistribution 
approach [12] accomplish the simplification according 
to this criterion. 

Hybrid simplification mixing the two previous ones. 
Use the previous two criteria jointly to determine which 
focal elements should be removed at first. Methods like 
inner and outer approximation [13], rank-level fusion ap- 
proximation [14], non-redundancy approximation [15], 
iterative approximation based on distance of evidence 
[16] and correlation coefficient approximation [17] enter 
in this hybrid simplification strategy. 

In general, the hybrid simplification is the right direction to 
approximate a BBA due to the one-sidedness of the first and 
the second simplification criterion. 

In this paper, we propose a novel approach using the notion 
of focal element compatibility. In our method, each focal 
element has a compatible focal element which can be replaced 
by it due to the compatibility (based on a similarity measure) 
between them. To quantify the notion of compatibility, we use 
the mass value and the cardinality of the set which contains 
all the focal elements which can replace the given focal 
element jointly. The focal element with the highest degree of 
compatibility should be removed at first. Users can preset the 
number of remaining focal elements. After removing a focal 
element, the removed mass is redistributed to remaining focal 
elements to execute the next iteration according to our new 
mass assignment strategy. Experimental results based on the 
comparisons with other approximation strategies and related 
analyses justify that our approach is rational and effective. 

This paper is organized as follows. After brief prelimi- 
naries on Belief Functions in Section II and classical BBA 
approximation methods in Section IH, we will present the new 
approximation method based on focal element compatibility in 
Section IV. Evaluation of it and comparative analysis will be 
done in Section V with concluding remarks in Section VI. 


3) 


II. PRELIMINARIES 
A. Basics of Belief Functions 


We consider a frame of discernment (FoD) 0 = {64,..., An} 
whose elements are mutually exclusive and exhausive. A basic 
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belief assignment (BBA) over the FoD 0 is defined as 
S> m(A)=1, m(0) =0 (1) 


ACO 
If m(A) > 0 holds, A is called a Focal Element (FE). The 
belief function and plausibility function are defined as follows 


[2]. 
Bel(A)= S> m(B); PUA)= SY) m(B) 


BCA ANB#0 


In DST, two independent bodies of evidence (BOEs) are 
combined by Dempster’s rule as follows. VA € 2°: 


0, A= 

mA)=) 1 YS m(A)ma(B;), A440 8) 
A;NBj=A 

where K = ¥74,,.8,=0 ™1(Ai)m2(B;) is the conflict coeffi- 

cient, which represents the total degree of conflict. Other rules 

of combinations have also been proposed to combine BBAs 

in the literature [18] but they will be not detailed in this paper 
since this is out of its scope. 


B. Distance of Focal Elements 


We use the definition proposed by Denceux [13] to measure 
the distance between two focal elements, which is defined as 


6n(Ai, Aj) =m(Ai)| Ai] + m(Aj)|A5| 
— fmm( Ay) + m( Ay) Ai 9 AG 
For a given focal element A;, if dq(Ai,A;) = 
min,4; On(Ai, Aj), we will say that A; has the highest 


compatibility degree with A;, and A; shares the most similar 
information with A;. 


(4) 


III. BRIEF REVIEW OF BBA APPROXIMATIONS 


Some existing BBA approximation approaches are briefly 
reviewed in this section for the purpose of comparisons with 
our new method. 

1) k—l—« approximation [8]. This method involves three 
parameters and the approximated BBA is obtained by 


e keeping no less than k focal elements; 
e keeping no more than / focal elements; 
e deleting the masses which are no greater than x. 


In k —!—~- algorithm, all original focal elements are sorted 
according to the mass assignments in a decreasing order. Then, 
the first p focal elements are selected such that k < p </ and 
such that the sum of the mass assignments of these p focal 
elements is no less than 1 — x. The removed mass assignments 
are redistributed to remaining focal elements by a classical 
normalization procedure. 

2) Summarization approximation [9]. This method also 
keeps focal elements having largest mass values which is 
similar to the k — 1 — x method. The only difference is that 
the removed mass values are redistributed to their union set. 
Suppose that m(-) is the original BBA and & is the desired 
number of remaining focal elements in the approximated BBA 


m(-). Let M denote the set of k—1 focal elements with largest 
mass values in m(-). Then 7n(-) is obtained from m/(-) by 


m(A), AeM 
m(A) => DIAICA,A'EM m(A’‘), A = Ao (5) 
0, otherwise 
where Ag is 
Ao U Al (6) 


A’'¢éM,m(A’)>0 


3) D1 approximation [10]. Suppose that m/(-) is the original 
BBA and k is the desired number of remaining focal elements 
in the approximated BBA 7(-). Let M denote the set of k—1 
focal elements with largest mass values in m/(-) and M~ be the 
set including all the other focal elements of m/(-). D1 method 
is to keep all the members of I as the focal elements of 172(-) 
and to assign the mass values of the focal elements in M7 
among the focal elements in M according to the following 
procedure. 

For a focal element A € M~, in M, find all the supersets of 
A to construct a collection M4. If M4, is not empty, the mass 
value of A is uniformly assigned among the focal elements 
having smallest cardinality in M4. When My is empty, then 
construct MM’, as 


M', ={Be M||B| > |A|, BN AZ} (7) 


Then, if M/, is not empty, m(A) is assigned among the focal 
elements with smallest cardinality in 1/’,. The value assigned 
to a focal element B depends on the value of |B A]. Such 
a procedure is iteratively executed until all m(A) have been 
assigned to the focal elements in M/. 

If 14’, is empty, there are two possible cases: 


e If the total set O € M, the sum of mass values of the 

focal elements in (/~ will be added to ©; 

e If O ¢ M, then let © be a focal element of 7n(-) and 

assign the sum of mass values of the focal elements in 
M~ to mn(O). 
Note that the number of remaining focal elements is k — 1, if 
Oe M. 

4) Rank-level fusion approximation [14]. This method 
uses jointly the mass assignments and cardinalities of focal 
elements to make the simplification. The specific procedure is 
listed as follows. 


e Sort all the focal elements of the original BBA (with L 
focal elements) according to the mass assignments (in 
ascending order which is due to the assumption that the 
focal element with smallest mass should be removed at 
first). The rank vector obtained is 


l = [T'm(1), T'm(2), tees T'm(L)| (8) 


e Sort all the focal elements of the original BBA according 
to the cardinalities (in descending order which is due 
to the assumption that the focal element with large 
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cardinality should be removed at first). The rank vector 


obtained is 
Te = [re(1), Te(2), ---, Pe(L)] (9) 
e Execute the rank-level fusion and the comprehensive rank 
vector 1S 
rp = (re), rp), 7¢(Z)] (10) 
where 
ret) =a-Tm(t) + (1— @) - rel?) (11) 


The parameter a € [0, 1] is to weight the two different criteria. 
Finally, we remove the focal element with the smallest 7 
value and do the renormalization of remaining focal elements. 
Repeat the above steps until only & focal elements remain and 
the total mass assignments value to be deleted is no greater 
than 2. 

5) Correlation coefficient approximation [17]. The cor- 
relation coefficient proposed by Jiang [19] can measure the 
similarity between two BBAs. In this approximation approach, 
we remove a focal element A; from the original BBA m(-) 
and the mass of A; is redistributed to remaining focal el- 
ements to generate a new BBA 7,(-). Then, we calculate 
the correlation coefficient between m and m;. We perform 
the same operation for each focal element and sort all the 
focal elements in ascending order according to the correlation 
coefficient. Finally, we remove the largest k focal elements 
from the original BBA and do the normalization according to 
a new assignment strategy. 

6) Iterative approximation based on distance of evidence 
[16]. In this algorithm, we remove at first a focal element 
A; from the original BBA m(-) and we normalize the re- 
maining focal elements to generate a new BBA 7; (-). Then, 
we calculate Jousselme’s distance between m and m,;. We 
perform the same operation for each focal element. Finally, 
we remove the focal element which generates the new BBA 
having the closest distance with the original BBA and after a 
normalization we proceed the next iteration. The above steps 
are performed iteratively until only k focal elements remain. 


IV. NEW BBA APPROXIMATION BASED ON FOCAL 
ELEMENT COMPATIBILITY 


In this section, a novel method for approximating a BBA is 
proposed. As briefly shown in the previous section, the existing 
approaches remove some focal elements according to the mass 
assignment, the cardinality or both two criteria. Here we adopt 
a different standpoint in which a specific focal element can 
be removed if there exists a number of other focal elements 
compatible with it, i.e., its degree of incompatibility is small. 
Now the focus is how to define the degree of incompatibility 
of a focal element. We define the incompatibility degree for a 
focal element at first. 


A. Degree of Incompatibility of Focal Elements 
As mentioned before, the distance between two focal ele- 
ments is given by Eq.(4). The compatible focal element AC! 


'We use the notation “C” as the upper index because it is the first letter of 
word “Compatible”. 


of a given focal element A; C © for a BBA m(-) (with | focal 
elements) is defined by 


A® © arg min 6,(Aj, Aj) 


Aj 
S.t. 


Aj CO 
j=1,2,..,1, 5 #i 
Ag has the smallest distance with the focal element Aj, i.e., 
among all focal elements, AY is the most compatible with 
Aj. It should be noted that AY can be replaced by A;, but the 
reverse may not be true. 
We define the degree of incompatibility of the focal element 


(12) 


mA) MAO 
\) A) IME? 7% 
BCE Ne ic ME =9 ve 
where 
MP S{A AP = AggH1,2.n01,9 23} (14) 


The set M© contains all the focal elements which can replace 
A;. The [C'P(A;) value describes the average effect on the 
|M°| (ME # O) focal elements after removing A;. The 
smaller [C'P(A;) value, the smaller the effect, which is pre- 
ferred. From another perspective, the effect can be explained 
as the incompatibility degree of A;. The smaller the effect, 
the smaller the incompatibility degree and the more it can be 
removed. MC = @) means that no focal elements can replace 
A;, so its degree of incompatibility is infinite. 

Here we provide a simple example to show how M@ and 
IC P(A;) are computed. 


Example 1: Consider the BBA m/(-) defined over the 
FoD © = {61,092,643}. The mass assignments of focal 
elements Ay = {6:1}, Ag = {02}, As = {02,03} and 
Ag = {61, 02,03} are as follows. 
m(A,) = 0.5, m(Azg) = 0.28 
m(A3) = 0.17, m(A4) = 0.05 
1) We calculate the distance between any two focal elements 
and find the compatible focal element for each focal element. 
bn (Ay, Ag) = 0.78, dq(A1, Az) = 0.84 
bn (Aq, Aa) = 0.1, dq(Az, Az) = 0.17 
bn (Ag, As) = 0.1, dq (Az, Aa) = 0.05 
AY = AY = AS = Aa, AY = As 
2) We compute M° for each focal element. 
ME = Mf =90 
My = {Aa}, My = {A1, Ao, Aa} 
3) We compute [C'P(A;) for each focal element. 
ICP(A,) = ICP(Az) = 00 


m(A 0.17 
ICP(A3) = a = =0.17 
3 
A 
ICP(Aa) = re = sa = 0.0167 
4 
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So, Ay = {61,02,03} should be removed at first when 
approximating the original BBA m/(-). 


B. New Mass Assignment Strategy 


Here, we propose a new mass assignment strategy based on 
distance of focal elements. Let m/(-) denote the original BBA 
with / focal elements and 77(-) denote the remaining BBA after 
removing the focal element A,, where A‘,i = 1,2,...,1—1 
are the focal elements of 77(-). Then 72(-) is obtained by 


Al) + pet, ALO 
mA!) = + ‘) D-65 (At, A,) ; # (15) 
0, Al = 
where 
l-1 1 
D= Al 1 
ia 


The proof that r(-) is a true normalized BBA is given in 
Appendix. 

From Eq.(15) and (16), we can see that the mass of each 
removed focal element A, is redistributed to remaining focal 
elements A; according to their distances to A,. The smaller 
the distance, the more mass is committed to Aj. Based on 
the compatibility of the focal elements and the new mass 
assignment strategy, we propose a novel BBA approximation 
approach described in the next subsection. 


C. New BBA Approximation Algorithm 


Let m(-) denote the original BBA with / focal elements. In 
the approximation, we want to keep k (k < 1) focal elements 
and remove the focal elements one by one iteratively. The 
detailed steps of this new BBA approximation method are as 
follows. 


e Step 1: Calculate [CP(A;) for each remaining focal 
element; 

e Step 2: Sort all the focal elements in descending order 
according to their incompatibility degree to obtain the 
sorted list of focal elements; 

e Step 3: Remove the last focal element A, of the sorted 
list of focal elements, and redistribute its mass value to 
the mass of focal elements upper it in the sorted list to 
generate an approximated BBA 7 according to our new 
mass assignment strategy. Reduce the number of focal 
elements by one, ie., / + 1 — 1; 

e Step 4: Assign m = mm. If the number of removed focal 
elements is not reached, go to Step 1, otherwise output 
m as the final approximated BBA. 

The whole procedure is illustrated in Fig.1. 

Here we provide an illustrative example to show how our 
approximation method works and we compare it with other 
methods. 


Example 2: Consider the BBA m/(-) defined over the 
FoD 0 = {61, Ao, As, 04, Os} listed in Table I. 

1) k —l—« approximation. Here & and / are set to 5. x 
is set to 0.2. The focal elements Ag = {62,03,04,05} and 
Az = {02,05} are removed without violating the constraints 


A BBA m with / 
focal elements 
Calculate /CP(A;) for each 
remaining focal element 
Sort all the focal elements 
in descending order 
Remove the last focal 
element 4,. 
Redistribute the mass of A,. to 


remaining focal elements to generate 
an approximated BBA m 


1=I-l,m=m 


x 
Output m as the final 
approximated BBA 
Scheme of the new BBA approximation. 


Fig. 1. 


TABLE I 
FOCAL ELEMENTS AND MASS VALUES OF m(-). 


Focal Elements Mass Values 
Ai = {61} 0.13 
Ag = {02,03, 04, 05} 0.06 
Az = {64,5} 0.3 
Ag = {63,05} 0.15 
A5 = {01, 02} 0.14 
Ag = {02, 04, 05 } 0.12 
Az = {02,05} 0.1 


in k—1—z. The remaining total mass value is 1—0.06—0.1 = 
0.84. Then, all the focal elements’ mass values are divided by 
0.84 to accomplish the normalization. The approximated BBA 
mkl@(.) is listed in Table II, where A’, i = 1,2,3,4,5 are the 
focal elements of 1n*!*(.), 


TABLE II 

rable (.) OBTAINED USING k —1— «x. 
Focal Elements Mass Values 
Aj = {01} 0.1548 

Ab = {04,95} 0.357 

Al = {63,05} 0.1786 

Al, = {01,02} 0.1667 

A = {02,04, 05} 0.1429 


2) Summarization approximation. Here k is set to 5. Ac- 
cording to the summarization method, the focal elements 
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Ay = {02, 93, O04, 05}, Az = {62, 05} and Ag = {O2, 04, 05} 
are removed and their union set {62,03,04,05} is generated 
as a new focal element (existed already) with mass value 
m(Ag) + m(A7) + m(Ag) = 0.28. The approximated BBA 
moun (.) is listed in Table II. 


TABLE III 
mm S"™ (.) OBTAINED USING SUMMARIZATION. 


Focal Elements Mass Values 
Aji = {61} 0.13 
AS = {02, 03, 04, 5} 0.28 
A = {04,05} 0.3 
Ai = {03, 05} 0.15 
AS _ {01,02} 0.14 


3) DI approximation. Here k is set to 5. It can be obtained 
that As, A4,A5,A1 belong to M, and Ag, A7, Ao belong to 
M~. For Ag and Ag, there are no supersets of them in M, i.e., 
Mz, = 9, and we can not construct the set M‘,, ie., M’/, = 0. 
So the mass values of Ag and A» are assigned to the total 
set ©. For A7, we can construct the set M!, = { As, Ag, As}. 
The parameter ratio and number are calculated to be | and 3. 
Therefore, m(A7)/3 = 0.0333 is added to the mass value of 
A3, Aq and As respectively. The approximated BBA 71?!(-) 
is listed in Table IV. 


TABLE IV 

7? 1(.) OBTAINED USING D1. 
Focal Elements | Mass Values 
Aj = {01} 0.13 
AS = {04,05} 0.3334 
A = {03,05} 0.1833 
Al, = {01, 02} 0.1733 
AL =90 0.18 


4) Rank-level fusion approximation. Here & and / are set 
to 5 and x is 0.2. The parameter a is set to 0.5. At the 
first iteration, we calculate the comprehensive vector rf = 
[r¢(A1), r¢(Aa),--,7¢(A7)] = [5.5,1,5, 4.5, 4, 2.5, 2.5]. 
Then we remove Az: = {62,63,64,05} at first and 
do the normalization of remaining focal elements. At 
the second iteration, we obtain the comprehensive vector 
rp = [r(Ar),r¢(As), rf(Aa), r¢(As), 9 (As), r¢(A7)] = 
[4.5, 4, 3.5, 3, 1.5,1.5]. Then, we remove Ag = {62, 04,05} 
(or A7) and normalize the remaining focal elements to obtain 
the final approximated BBA 7n”*"*(.) listed in Table V. 


TABLE V 
mRenk (.) OBTAINED USING RANK-LEVEL FUSION. 


Focal Elements | Mass Values 
Ai = {01} 0.1585 
AS = {04,05} 0.3659 
AS = {03,65} 0.1829 
Al, = {01,02} 0.1707 
As = {02,05} 0.122 


5) Correlation coefficient approximation. Here k; is set to 2, 
i.e., we have to remove two focal elements. The correlation 


coefficients between the remaining BBA 7n;(-),7 = 1, 2,...,7 
and the original BBA m(-) are 0.9805, 0.9981, 0.9274, 0.9778, 
0.9842, 0.9946 and 0.9927. We sort all the focal elements in 
ascending order according to the correlation coefficient and 
remove the two bottom focal elements Az = {62, 03, 04, 05} 
and Ag = {02,604,065} from the original BBA. Then, we 
redistribute the removed mass to remaining focal elements to 
obtain the final approximated BBA 7° (-) listed in Table VI. 


TABLE VI 
maCC (-) OBTAINED USING CORRELATION COEFFICIENT. 

Focal Elements | Mass Values 

A = {91} 0.13 

AS = {04,05} 0.3718 

AS = {63, 05} 0.1839 

Al, = {01, 02} 0.1677 

Al = {02,65} 0.1466 


6) Iterative approximation based on distance of evidence. 
Here k is set to 2, i.e., we have to remove two focal 
elements. At the first iteration, Jousselme’s distances between 
the remaining BBA 7,(-),i = 1, 2,..., 7 and the original BBA 
m/(-) are 0.1053, 0.0315, 0.1932, 0.1049, 0.105, 0.05981 and 
0.05982. We remove Az = {62,03, 64,65} at first. Then, we 
normalize the remaining focal elements and assign m = 72 to 
execute the next iteration. At the second iteration, Jousselme’s 
distances between the remaining BBA 7;(-),i = 1,3, 4,5, 6,7 
and m/(-) are 0.1113, 0.2101, 0.114, 0.1118, 0.0644 and 
0.0663. So we remove Ag = {62,604,065} and normalize the 
remaining focal elements to obtain the final approximated 
BBA m**(-) listed in Table VII. 


; TABLE VII 
rv? #5 (.) OBTAINED USING DISTANCE OF EVIDENCE. 


Focal Elements | Mass Values 
Ai = {01} 0.1585 
Al = {04,65} 0.3659 
AS = {03, 05} 0.1829 
Al, = {01, 92} 0.1707 
Al = {02, 05} 0.122 


7) ICP method (Our approximation method). The desired 
remaining focal elements is set to k = 5 and we obtain the 
final approximated BBA in two iterations as follows. 


e The first iteration: We first calculate [C'P(A;),i = 
1,2,...,7 and sort all the focal elements in descending 
order according to [C'P(A;) value. The result of the first 
iteration is listed in Table VIII. Because [C' P(Az) is the 
smallest and the focal element Az = {62,03, 04,05} is 
removed at first, then we redistribute the mass of A» to 
remaining focal elements to proceed the next iteration. 

e The second iteration: We recalculate [CP(A;),i = 
1,3,4,5,6,7 and sort all the remaining focal elements. 
The result of the second iteration is listed in Table 
VIII. Because [CP(A7) is the smallest value, the focal 
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element A7 = {02,65} is removed at this iteration. Now 
the number of remaining focal elements is five and we 
redistribute the mass of A7 to remaining focal elements 
to obtain the final approximated BBA 7/°?(-) listed in 
Table IX. 


TABLE VIII 
THE RESULTS OF TWO ITERATIONS USING ICP. 


The First Iteration 
Focal Elements Mass Values |Meo| ICP(A;) 
A3 = {64, 05} 0.3 ME =0 oo 
Aa = {03,05} 0.15 ME =6 oo 
A7= {02,05} 0.1 ME =@ lev) 
As = {61,09} 0.14 1 0.14 
Aj = {01} 0.13 1 0.13 
Ag = {62, 04, 65 } 0.12 1 0.12 
Ay = {02,03, 04,05} 0.06 4 0.015 
The Second Iteration 
Focal Elements Mass Values |Me| ICP(A;) 
Az = {04,95} 0.3105 ME =6 °° 
Aa = {63,05} 0.1605 ME =0 oo 
A5 = {01,02} 0.1439 1 0.1439 
Ay = {61} 0.1334 1 0.1334 
Ag = {62, 04, O5 } 0.1411 2 0.0705 
Az = {02,05} 0.1106 2 0.0553 
TABLE Ix 
mCP (.) OBTAINED USING ICP. 

Focal Elements Mass Values | 

Al ={01} 0.1491 | 

Al = {04,05} 0.3237 | 

A’ = {63,05} 0.181 | 

Al, = {01,02} 0.1658 | 

AL = {02,04, 05} 0.1804 | 


V. EXPERIMENTS AND ANALYSIS 


In this section, we compare all the aforementioned BBA 
approximation methods to demonstrate the effectiveness and 
interest of our method in terms of three Measures of Perfor- 
mance (MoP): 1) closeness, 2) computational efficiency, and 
3) decision-making. 


A. MoP of Closeness and Computational Efficiency 


The smaller the distance between the new approximated 
BBA and the original BBA, the less information is lost, which 
is preferred. We use d#,, distance [20] to describe the degree 
of closeness between two pieces of evidence, which is defined 
as 


an-1 

Ne- S> [d'(Bh(Ai), Bh(Ai))? 7) 
i=l 

Here N, = 1/2"! is the normalization factor. BI,(A;) and 

BIp(A;) are belief intervals of A; for m(-) and ma(-), which 

are denoted by [Bel, (Aj), Ply (A;)] and [Belg (Aj), Ply (A;)]. 


di3,(m1,m2) = 


TABLE X 
ALGORITHM 1: RANDOM GENERATION OF BBA. 


Input: ©: Frame of Discernment; 

Nmax: Maximum number of focal elements 
Output: m(-): BBA 

Generate P(©), which is the power set of O; 
Generate a random permutation of P(Q) > R(@); 
Generate an integer between 1 and Nmaz — 1; 
FOReach First k elements of R(O©) do 

Generate a value within [0,1] > m;,2 = 1, 2,...,; 
END 

Normalize the vector m = [m1,m2,...,mi] + m’; 


The strict distance between interval numbers [a,b;] and 
[a2, b2](b; > aj,i = 1,2) is defined by 


d (a1, bi], [az, ba]) = 


a, + by az + bo 2 : 1}b; -—a, by — ag 
2 2 "S| 2 2 


Our comparative analysis is based on a Monte Carlo sim- 
ulation using MZ = 200 random runs. The cardinality of 
the FoD is |O| = 5. In the j-th simulation run, a BBA 
mJ(-) is randomly generated according to Algorithm 1 [21] 
of Table X. The number 7 of remaining focal elements for 
all the approaches are set to from 2 to 30 and then the 
different approximation results 7?(-) can be obtained using 
different methods, where i denotes the i-th approximation 
approach. We record the computational time of the original 
BBA combination of m/(-) @m/(-) with Dempster’s rule and 
the computational time of using Dempster’s rule for each 
approximated BBA 7m?(-) @ m?(-). The average (over 200 
runs) computational time for the original and approximated 
combination are shown in Fig.2. The average (over 200 runs) 
distance between the original BBAs and the approximated 
BBAs obtained using different approaches given different 
remaining focal elements’ number are shown in Fig.3. 

As we can see in Fig.2, all the BBA approximation ap- 
proaches permit to reduce the computational time with respect 
to the original computational time due to the removal of 
focal elements. Besides, from Fig.3 we observe that, the 
approximated BBAs using our new proposed approach are 
globally closer to the original one when compared with other 
approaches, which represents the least loss of information. 
Note that when the number of remaining focal elements is 
small, there are no data points for the curve of k — 1 — «x 
and rank-level fusion methods because they can not remove 
a certain number of focal elements like other methods due 
to the constraint that the removed masses are no greater than 
x= 0.2. 


(18) 


B. MoP of Decision-making 


In this work we use the DSmP Transformation [18] to 
make the final decision by selecting the 0; with the maxi- 
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Fig. 3. Closeness comparisons. 


mum DSmP.(6;) value. The DSmP.(6;) probability of any 
elements 0;,7 = 1,2,...,|O| of the FoD © can be obtained by 


DSmP.(6;) =m(0;)+ 


m(X) 
[m(0;) + €| i. > m(Y) +e: |X| (19) 
554, Ye2® 
IX122 Yh 


where ¢€ > 0 is a tuning parameter. 

In our simulations, all the approximation approaches are 
compared from the aspect of the accuracy of decision-making. 
The cardinality of the FoD is |O| = 5 and the parameter ¢ has 
been set to 0.001. Firstly, 1000 BBAs are randomly generated 
according to Algorithm 1 [21] of Table X. Then, use the DSmP 
Transformation to make the final decision for the original 
BBAs. After that, 1000 approximated BBAs are generated 
and 1000 decisions are made for each approximation method. 
Finally, the accuracy of decision-making is counted for each 


method and the results with different number of remaining 
focal elements are shown in Fig.4. 

As we can see in Fig.4, although ICP method is not the best, 
it presents a stable and good performance, especially when the 
number of remaining focal elements is small, which represents 
the less loss of information from our standpoint. It should be 
noted that there are no data points for the curve of k —/— 2 
and rank-level fusion methods due to the constraint mentioned 
before, when the number of remaining focal elements is small. 
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Fig. 4. Accuracy of decision-making comparisons. 


VI. CONCLUSION 


With the increase of cardinality of the FoD, evidence 
combination exhibits a large computational cost. In this paper, 
a novel BBA approximation approach based on focal element 
compatibility is proposed based on a new mass assignment 
strategy. This new method offers a good balance between the 
computational time and the loss of information. Simulations 
and comparative analyses show the interest and efficiency 
of our new method. In future, we will consider other BBA 
approximation approaches based on the removal of focal 
elements to solve the bottleneck of BBA combination for 
different rules of combination. 


APPENDIX 


The proof that 77(-) which is obtained by the new mass 
assignment strategy is a true normalized BBA is as follows. 

Proof: 

1) m(0) = 0. 

2) dn (Ai, A,) > 0 for any focal element Aj # 0). 

bn(Aj, Ar) =m(Aj)| Aj] + m(Ar 
— [m(Aj) + m(A, 
2m(Ai)|Aj| + m(Ar)| Ar 

— [m(Aj) + m(A,)}min {| Aj], |Ar|} 
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Suppose that min {|A‘|,|A,|} = |A,]. 


bn (Aj, Ar) 2m(Aj)| Aj] + m(A,)|Ar| 
=m(A;)(|Aj| — |Ar|) > 0 


a / / m(Ar) 
md =>o may ° Dina) 


l-1 l-1 
7) r_, MAr) 1 
ke) aes 2. (Anan) 


l-1 
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Abstract—Data association is one of the main tasks to achieve 
in perception applications. Its aim is to match the sensor detec- 
tions to the known objects. To treat such issue, recent research 
focus on the evidential approach using belief functions, which 
are interpreted as an extension of the probabilistic model for 
reasoning about uncertainty. The data fusion process begins by 
quantifying sensor data by belief masses. Thereafter, these masses 
are combined in order to provide more accurate information. 
Finally, a probabilistic approximation of these combined masses 
is done to make-decision on associations. Several probabilistic 
transformations have been proposed in the literature. However, 
to the best of our knowledge, these transformations have been 
evaluated only on simulated examples. For this reason, the 
objective of this paper is to benchmark most of interesting prob- 
abilistic transformations on real-data in order to evaluate their 
performances for the autonomous vehicle perception problematic. 


Keywords: Data Association, Evidential Theory, Belief Func- 
tions, Probabilistic Transformation. 


I. INTRODUCTION 


Multiple Target Tracking (MTT) is important in percep- 
tion applications (autonomous vehicle, surveillance, etc.). The 
MTT system is usually based on two main steps: data associ- 
ation and tracking. The first step associates detected objects in 
the perceived scene, called targets, to known objects charac- 
terized by their predicted tracks. The second step estimates the 
track states over time typically thanks to Kalman Filters [1], or 
improved state estimation techniques (like particle filters, etc). 
Nevertheless, bad associations provide wrong track estimation 
and then leads to false perception results. 

The data association problem is usually resolved by 
Bayesian theory [1], [2]. Several methods have been proposed 
as the Global Nearest Neighbor (GNN) method, the Prob- 
abilistic Data Association Filter (PDAF), and the Multiple 
Hypothesis Tracking (MHT) [3]-[5]. However, the Bayesian 
theory doesn’t manage efficiently data imperfection due to the 
lack of knowledge we can have on sensor quality, reliability, 
etc. To circumvent this drawback, the Evidential theory [6], 
[7] appears as an interesting approach because of its ability 
to model and deal with epistemic uncertainty Its provides a 
theoretical framework to manage ignorance and data imper- 


fection. 
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Several evidential data association approaches have been 
proposed [8]-[11] in the framework of belief functions. Rom- 
baut [11] uses the Evidential theory to measure the confidence 
of the association between perceived and known obstacles. 
To manage efficiently objects appearance and disappearance, 
Gruyer and Cherfaoui [12] propose the bi-directional data 
association. The first direction concerns the target-to-track 
pairings which provides a good way to manage the appearance 
of the new tracks. The second direction concerns the track-to- 
target pairings and then manage disappearance of tracks. This 
approach has been extended by Mercier et al. [10] to track 
vehicles by using a global optimization to make assignment 
decisions. To reduce the complexity for real-time applications, 
a local optimization has been used [8], [13]. For all these 
methods, the data fusion process begins by defining belief 
masses from sensor information and prior knowledge. These 
masses represent the belief and ignorance on the assignment 
hypotheses. Thereafter, the masses are combined in order to 
provide a complete information of the considered problem. 
Finally, to make a decision, the belief masses are classically 
approximated by a probability measure thanks to a chosen 
probabilistic transformation. 


For data association applications, the widely used prob- 
abilistic transformation (i.e. approximation) is the pignistic 
transformation [8], [10], [13], [14]. This transformation is 
based on a simple mapping process from belief to prob- 
ability domain. However, several published works criticize 
the pignistic transformation and propose generalized and/or 
alternative transformations [16]-[21]. To our knowledge, the 
proposed transformations have been evaluated by their authors 
only on simulated examples. The main objective of this paper 
is to compare these transformations on real-data in order to 
determine which one is well-suited for assignment problems. 


The rest of the paper is structured as follows. Section 
II recalls the basics of belief functions and their uses in 
data association problems. In Section III, the most appealing 
probabilistic transformations are presented and compared on 
the well-known KITTI public database in Section IV. Finally, 
Section V concludes the paper 
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II. BELIEF FUNCTIONS FOR DATA ASSOCIATION 


To select “best” associations, the data fusion process con- 
sists in four steps: modeling, estimation, combination and 
decision-making. This section presents their definitions and 
principles. 


A. Basic Fundamentals 


The Belief Functions (BF) have been introduced by 
Shafer [7] based on Dempster’s researches [6]. They offer a 
theoretical framework for reasoning about uncertainty Let’s 
consider a problem where we have an exhaustive list of 
hypotheses (H;) which are mutually exclusive. They define 
a so-called frame of discernment O: 


k 
6 = |) {Aj} with H,N Hj =0 (1) 


j=l 
The power set 2° is the set of all subsets of 0, that is: 
22 S40 Aj sg Hess Bells; He by Oh (2) 


The proposition A = {H, Hz, H3} represents the disjunc- 
tion meaning that either H, or H2 or Hz can be the solution 
to the problem under concern. In other words, A represents 
a partial ignorance if A is the disjunction of several elements 
of ©. The union of all hypotheses © represents the total 
ignorance and () is the empty set that represents the impossible 
solution (interpreted usually as the conflicting information). 

The truthfulness of each proposition A € 2° issued from 
source j is modeled by a basic belief assignment (bba) 


m? : 2° - [0,1], S> mP(A) =1 (3) 
AE2° 


Thereafter, the different bbas (m?) are combined which 
provides a global knowledge of the considered problem. 
Several rules of combination have been proposed [22], the 
conjunctive operator is widely used in many rules proposed in 
the literature for the combination of sources of evidence. For 
instance, Shafer [7] did propose Dempster’s rule of combina- 
tion below which is nothing but the normalized version of the 
conjunctive rule [23]: 


Pp 
m$s(A) = 7K I] m? (4;) 

Aqn...AAp=A j=l 
ipa O)? 20, 


(4) 


where KC is a normalized coefficient: 


S> [ [ mP (4). (5) 


A1N...NAp=0 j=1 


K= 


Finally, in order to make decisions in 0, a probabilistic 
approximation of the combined bbas (m$<(A)) is usually 
done. The upper and the lower bounds of the unknown 


probability P(A) are defined by the belief Bel(A) and the 
plausibility Pi(A) functions given respectively by: 


Bel(A) = S7mq(B) 
PUA) = S2 mS5(B) ve 


B. Belief Modeling 


The data association problem can be analyzed from two 
points of view: target-to-track and track-to-target association. 
Consequently, two frames of discernmentare defined: O,;,, and 
Oj, 7 = 1,...,n, with n the number of targets, and j = 
1,...,m, with m the number of tracks: 


Oj. ep ae ee au 
8.5 = {Xs X (ag) Xmas X(ma f 


where O;,, is composed of the m possible target(i)-to-track(/) 
associations denoted Y(;,;). The hypothesis of appearance is 
represented by Vig ©.; contains the n possible track(7)- 
to-target(z) associations denoted X ( and_X, (x,j) 4s the track 
disappearance. 


(7) 


4,5)? 


C. Basic Belief Assignment 


For target-to-track assignment, three bba’s are used to 
answer the question “Is target X; associated with track Y;?”: 
(Y(.,;)): belief in “X; is associated with Y;”, 
J 

The recent benchmark [24] on huge real data shows that 
the most suited model is the non-antagonist model [11], [25] 


which is defined as follows: 


em 


Ya aH): belief in “X; is not associated with b tee 


e mm," (O;,.): the degree of ignorance. 


Oi,. _ 0 me rea Og 

mj; (Ya,5)) = { ®1(1;,;) ij € [71 (8) 
Qj, = 2(J;,;) ij E (0, 7 

m; Wan) ={ telat (9) 
©i,. Oi. 95... (TS 


where 0 <7 < 1 represents the impartiality of the association 
process and J; ; € [0,1] is an index of similarity between X; 
and Y;. ®)(.) and 2(.) are two cosine functions defined by: 


©, (Ji,;) a 
$2(1i,5) = 


1—cos(r Bigot | 


11 
1+ cos(n4)] : a 


N/Q wl 


where 0 < @ < 1 is the reliability factor of the data source. 
In the same manner, belief masses are generated for the track- 
to-target assignment. 


"Y(:,») refers to the fact that no track is assigned to the target(7). 
*Y(i,j) defines the complementary hypothesis of Y(;,;), 
Yoig) = {Xap Ye g—ps Ya5+y0 ++ Yam) Yaa} 
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Table I 
PROBABILITIES OF TARGET-TO-TRACK ASSOCIATIONS 
XY Px, (Ya) Pi,(Yaym)) Pa. a,x)) 
X2 Po, (Y(21)) P2,(Y2m)) —_ P2,.Va,*)) 
Xn Pn,.(Ynty) Pn,.Y¥(n,m)) — Pn,.(Vin,»)) 


D. Belief Combination 


pase on Dempster’s rule (4), the combined masses m°::- 


(and m®-7) over 29. (and 29-7) can be computed as fol- 
lows [26]: 
me: (Yaz) = K- mz (Yay TI Q(i,a) 
ae ha 
me {Ya jy Yau Yeoh = Kya...) Il Bia) 
a=1 
aA} 
A, axl 
m (Yin) = KT] Bea) 
a=1 
m®%.. (0; )= K- mei. (O;,.) 
a=1 
(12) 
with: 
Oia) = 15 ma‘ (Y(i,a)) 
Bli,a) = =™Ma ” (Yay) ee 
Vi,(jn-,8)) = Mj” (O4,.).-.my"” (Oi,.) ; 
K= TL aca + ome (Ya) II eas 
b=1 
b4a 


E. Decision-Making 


Finally, the probabilities matrix P;, (Pj) is obtained by 
using a probabilistic transformation. Table I presents the P; 
matrix where each line defines the association probabilities 
of the target X; with all tracks Yj. Pj,.(Y(i,.)) represents the 
appearance probability of X;. 

The association decisions are made by using a global or 
a local optimization strategy. The Joint Pignistic Probability 
(JPP) [10] selects associations that maximize the probability 
product. However, this global optimization is time-consuming 
and can select doubtful local associations. To cope these 
drawbacks, local optimizations have been proposed as the 
Local Pignistic Probability (LPP). Interested readers in the 
benchmark of these algorithms can refer to [14], [15]. 


III. PROBABILISTIC TRANSFORMATIONS 


The generalized formula of the probabilistic transformation 
can be defined as follows: 
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PvE y= mo ue 
tt De, 


Ae2? 
Yapca 


Va A ) me (A), (13) 


where A represents the partial/global ignorance about the 
association of target X; and T(Y(;,;), A) represents the rate of 
the ignorance mass m®::: (A) which is transfered to singleton 
Yas): 

Several probabilistic transformations have been proposed in 
the literature. In this section, only the most interesting ones 
are presented. 


A. Pignistic Probability 


The pignistic transformationdenoted by BetP and proposed 
by Smets [27], [28] is still widely used for evidential data as- 
sociation applications [8], [10], [25], [29]. This transformation 
redistributes equitably the mass of ignorance on singletons as 
follows: 


TpetP,,. (Y(i,j), A) [A]? (14) 


where |A| represents the cardinality of the subset A. However, 
the pignistic transformation(14) ignores the bbas of singletons 
which can be considered as a crude commitment. BetP is easy 
to implement because it has a low complexity due to its simple 
redistribution process. 


B. Dezert-Smarandache Probability 


Besides of the cardinality, Dezert-Smarandache Probability 
(DSmP) transformation [18] considers the values of masses 
when transferring ignorance on singletons: 


m®i- (YG, 3) +e 
dom (Yew) +e |A| 


Yi,nyCA 


TDSmP,,. (Gays A) 


(15) 
The value of the tuning parameter € > 0 is used to adjust 
the effect of focal element’s cardinality in the proportional 
redistribution, and to make DSmP defined and computable 
when encountering zero masses. Typically, one takes € = 
0.001. The smaller ¢, the better approximation of probability 
measure we get [18]. DSmP allows to obtain in general 
a higher Probabilistic Information Content (PIC) [30] than 
BetP because it uses more information than BetP for its 
establishment. The PIC indicates the level of the available 
knowledge to make a correct decision. PIC’ = 0 indicates 
that no knowledge exists to take a correct decision. 


C. MultiScale Probability 


The Multiscale Probability (A/ulP) transformation [19] 
highlights the proportion of each hypothesis in the frame of 
discernment by using a difference function between belief and 
plausibility: 
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TuuP,, (YG,j), A) = 
(PIS. (Yo,j)) — Bel® (Yu,3)))4 


Dd, (PIS (Yay) — Bel? (Ya,ny))* 
YG,ny CA 


, (16) 


where g > 0 is a factor used to amend the proportion of 
the difference (Pi(-) — Bel(-)). However, the Tywip,,. is not 
defined (2) when m/(-) is a Bayesian mass (PI(-) = Bel(-)). 


D. Sudano’s Probabilities 


Sudano proposes several alternatives to BetP as the Pro- 
portional Plausibility (PrPl) and the Proportional Belief 
(Pr Bel) transformations [18], [21]. Those latter redistribute 
respectively the ignorance mass according to the normalized 
plausibility and belief functions: 


PI%%- (YG,3)) 
T. Pl; Yi; : A oJ 
PrPli,, (Y(i,3)) A) S- PI%- (Yun) (17) 
Yoi,by CA 
Bel®:- (Yu, 5)) 
TPr Bel; Vig ast " 
PrBel;,. (Y(i,3), A) S- Bel? (Yu,ny) (18) 


Yu,ky CA 


E. Pan’s Probabilities 


Other proportional transformations have been proposed 
in [20]. Those transformations assume that the bba are pro- 
portional to a function S(-) which is based on the belief and 
the plausibility: 

T r i Yi; ; JA => Bay > 
PrBP,, (Y(i,3) A) 3 Si.) 


Yui,kyCA 


(19) 


where different definitions of S have been proposed: 


PrBP\,,. : S(i,j) = PIS (¥G,3)) « Bel (Yu,5)) 

PrBP2;,.: S(i,j) = Bel? (Yu5))- (1 — PIP (Ya,gy)) + 

PrBP3:,.: S(i,j) = PIS (Yu,j)) (1 — Bel? (Ya,)) + 
(20) 


IV. RESULTS 


This section presents a benchmark of the probabilistic 
transformations in the framework of the object association 
system for autonomousvehicles. The aim is to assign detected 
objects in the scene (targets) to known ones (tracks). The 
transformations have been evaluated on real data. 

The KITTI dataset* provides 21 sequences recorded from 
cameras mountedon a moving vehicle on urban roads [31]. To 
our knowledge, no comparison of probabilistic transformations 
has been done on real data where more than 30000 associations 
have been observed. These latter cover different road scenarii 
as shown in Fig. 1. For this work, detections are defined only 
by 2D bounding box in the image plane as presented in Fig. 1. 


3http://www.cvlibs.net/datasets/kitti/eval_tracking.php 


detected object = target 


Figure 2. The illustration of the distances a ht and a m [24]. 


A. Experimental Setting 

The assignment information are based on the distance 
between objects in the image plane. For that, the distance d;,; 
is defined as follows: 


tse 
dij = 5 (ae +4,'F), (21) 
where a (resp. ae !) is the Euclidean distance between 


bottom-right (resp. top-left) corners of the bounding boxes of 
target X; (detected object) and track Y; (known object) as 
presented in Fig. 2. 

The parameters of the bba model (11) are: a = 0.9 and 
T = 0.5. The index of similarity is defined as follows: 


ng { io 8 ,tf dij <D 
7 0 


22 

, otherwise, 22) 

where D is the limit distance for association which is deter- 
mined heuristically, e.g. D = 210 in this work. 

The tuning parameters « = 0.001 and g = 5 for DSmP 

and MulP transformations respectively. The LPP algorithm 
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has been used as optimization strategy in the decision-making 
step. 


B. Comparison of probabilistic transformations 


All discussed transformations are characterized by an equiv- 
alent complexity except the pignistic transformation. BetP is 
computed directly from combined masses which leads to a 
lower computational time. 

To compare the performance of the probabilistic transfor- 
mations presented previously, the object association system is 
evaluated by the True Associations Rate (TAR): 


do, True Association; 
>>, Ground Truth, ’ 


where t is the frame index. 

Table II compares association outcomes of the system based 
on different probabilistic transformations. Only target-to-track 
association results have been presented in Table II due to 
the lack of space. However, from track-to-target association 
results, similar comments/conclusions hold. The penultimate 
row of Table II shows the weighted average of TAR value 
based on all sequences which is given by: 


TAR= 


(23) 


20 
TAR Gg => wT AR, 
i=0 
where T'AR; is the TAR value of the i-th sequence, and where 
the weight w; is w; = n;/ eae n,; and n; being the number 
of associations of the i-th sequence. For instance, TA Raver = 
0.9852 (or 98.52%) for the BetP transformation, etc. The last 
row of Table II represents the weighted standard deviation 
(ow) of association scores defined as follows: 


(24) 


20 
S> wi(TAR; — TARayg.)? 
n=0 

The obtained results show that PrBel, PrBP1, and 
PrBP2 provide the worst mean associations scores (< 
97.40%) with the largest standard deviation (1.36%) for 
PrBP2. It can be explained by the fact that these transfor- 
mations are based on the Bel function which is a pessimistic 
measurement. The rest of the transformations provide rates of 
correct association (i.e. scores) > 98.40% which represents 
a gain of +1%. The best mean score % 98.50% is given 
by BetP, PrPl, and MultP transformations. Based only 
on the mean score criterion, BetP seems more interesting 
because it provides better scores on 15 sequences from 21 
as illustrated in Fig. 3. In addition, BetP is based on a 
very simple transferring process of uncertainty which makes 
BetP a good choice for real-time applications. However, this 
apparent advantage of BetP needs to be seen in relative terms 
because BetP also generates a quite large standard deviation 
of 1.38%, which clearly indicate that BetP is notvery precise. 
PrPl and MultP are also characterized by a relatively 
high standard deviation (1.22% and 1.39%). On the other 
hand, the lower standard deviation 1.05% is given by DSmP 
transformation with a good association score = 97.85%. This 


Ow = (25) 


transformation performs well in term of PCI criteria which 
leads to make correct decisions [18]. Consequently, DSmP 
is an interesting alternative to BetP for the data association 
process in autonomous vehicle perception system. 


1 HE Best — 

PrBP3 2 (Ee. 7 [Worst scores | 5 
PP. gg 12 
PrBP1 ppl 5 

Dl ceesesereecereeerereae eee FP 

PrBe! jue? 

PrP | (eles 5 

BetP fault ie 

0 5 10 15 


Number of worst/best scores 


Figure 3. The number of worst/best scores obtained by each probabilistic 
transformation on 21 sequences; e.g. PrBel provides three worst scores 
(sequences 0, 10, and 17) and only one best score on sequence 12. 


V. CONCLUSION 


An evaluation of several probabilistic transformations for 
evidential data association has been presented in this paper. 
These transformations approximate the belief masses by a 
probability measure in order to make association decisions. 
The widely used probabilistic approximation is the pignistic 
transformation. However, several published studies criticize 
the choice of this method of approximation and propose 
generalized transformations. 

We did compare the performances of these probabilistic 
transformations on real-data in order to determine which 
one is more suited for assignment problems in the context 
of autonomous vehicle navigation based on real datasets. 
The obtained results based on the well-known KITTI dataset 
show that the pignistic transformation provides one of the 
better scores. However, it provides a quite large standard 
deviation contrary to DSmP transformation which provides 
the lowest standard deviation. In addition, DSmP procures 
a nearly similar association score to that given by BetP. 
Consequently, DSmP can be a good alternative to BetP for 
the autonomousvehicle perception problematic requiring a bit 
more computational power with respect to BetP. 
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Abstract— Uncertainty is an important dimension to consider 
to evaluate the quality of information. In real world, information 
tends, usually, to be uncertain, vague and imprecise leading to 
different types of uncertainty, such as randomness, ambiguity and 
imprecision. Methods to quantify uncertainty, will help to quantify 
information quality. This paper presents a general measure of 
uncertainty framed into the fuzzy evidence theory named GM, 
quantifying in an aggregate way the three basic types of 
uncertainty: non-specificity, fuzziness and discord considered 
within the framework of Generalized Information Theory (GIT). 
Monte-Carlo simulations are used to study the behavior of GM 
with respect to the up-cited uncertainty types. Results show that 
the total uncertainty GM behave properly as we increase and 
decrease the various types of uncertainty. 


Keywords—fuzzy evidence theory, uncertainty measures, 
fuzziness, non-specificity, discord, ambiguity, imprecision, fuzzy 
randomness 


I. INTRODUCTION 


Uncertainty is a primary importance in evaluating 
information quality. In the field of Generalized Information 
Theory (GIT), Klir and Yan [1] defined three basic types of 
uncertainty: non-specificity, fuzziness and discord (or 
randomness). These types are considered as the three main types 
of uncertainty that covered all the aspects. Indeed, fuzziness is 
due to the non-crisp boundaries of a set. Non-specificity is due 
to the numerous elements of a set. Discord appears when 
different occurrences are possible. The typology, presented in 
Fig. 1, is an extension of Klir and Yan’s typology that was 
originally initiated in Liu [2]. 


The typology exhibits three types of uncertainty, resulting 
from the combination of each pair of basic types of uncertainty. 
In fact, imprecision is introduced as a general concept for both 
fuzziness and non-specificity. Ambiguity is a combination 
between non-specificity and discord. For the third combination, 
the concept of discord used here, is equivalent to the randomness 
one, used by Pal et al. [3], but clearly differs from the 
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inconsistency concept introduced by Smets [4], or of conflict 
concept, in the Dempster’s sense [5]. No term has been proposed 
so far for combining fuzziness and discord. Hence, we propose 
here fuzzy randomness to designate the total uncertainty in the 
fuzzy-probability theory. Hence, three general terms are used to 
designate the combinations of (1) non-specificity and discord, 
i.e. ambiguity, (2) non-specificity and fuzziness, i.e. imprecision 
and (3) fuzziness and confusion, i.e. fuzzy randomness. The 
term “total uncertainty” is kept for the combination of the three 
basic kinds of uncertainty. 


Non-specificity 


Imprecision Ambiguity 


Total uncertainty 


" “a a 
’ . 
’ . 

¢ * bh . 

, Fuzziness, Discord 

/ . (Randomness) 
' 

i 

' 

' 

' 


n 
Vagueness H 
' 

Fuzziness ! 


Lf Roughness va Fuzzy randomness 
‘7 UndiscemibilityyY 


Fig. 1. Circular typology of uncertainty 


This paper proposes to pursue the analysis of the properties 
and behaviors of a general measure of uncertainty named GM, 
framed into the fuzzy evidence theory [6] that was originally 
defined in Liu [2]. The analysis has been further developed by 
Jousselme et al. [7] that resulted in a new measure of ambiguity, 
AM. Finally, Burkov et al. [8] presented preliminary results 
from an empirical study about the behavior of GM and AM. 


The paper is organized as follows: Section II introduces the 
fuzzy evidence theory that will be used in the following sections. 
It also presents existing measures of uncertainty as well as 
proposed measures, namely imprecision measure and GM 
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measure that quantifies all kinds of uncertainty. In section III, a 
consistency analysis is performed on the mentioned measures, 
to illustrate GM consistency behavior under the three main types 
of uncertainty: discord, non-specificity and fuzziness. The data 
used for experiments is provided by a fuzzy evidence number 
generator. We present our conclusions on the fourth section. 


II. UNCERTAINTY IN FUZZY EVIDENCE THEORY 
A. Basic concepts 


Definition 1. Let X = {x1,X2,..,Xy}, classically referred to the 
universe of discourse in the probability theory and the frame of 
discernment in the evidence theory.be a frame of discernment 
(FoD). It is a finite discrete set with N exclusive and exhaustive 
hypotheses. It defines the working space for the application as it 
entails all the propositions, for which information sources can 
provide evidence. We denote by: 


P(X) = 2% the power set of X (all possible sub-sets of X), 


P(X) = [0,1] * the set of fuzzy parts of X, 
A and B are two crisp sets from P(X), 


A and B are two fuzzy sets from P(X) with fuzzy 
membership degree 4(x;), V x; € A and Up (x;), Vx; €B. 


B. Fuzzy evidence theory 
1) Fuzzy sets theory 


Definition 2. A fuzzy set A [9] is a generalization of a classical 
set allowing each one of its elements to have a degree of 
membership to the set. The membership function is defined on 
the FoD X by: 


ug: X > [0,1], x > g(x) (1) 


Where Lg (x) is the membership degree of the element x to the 
fuzzy set A. A crisp set is a special case of a fuzzy set where 
by(x) = 1ifx € A and p,(x) = Oifx EA. 


2) Evidence theory 


The evidence theory started with Glenn Shafer [10]. He 
formalized the field of belief functions based on the work of 
Arthur Dempster on upper and lower bounds of probability [5]. 
For that, it is also called Dempster-Shafer theory (DST). DST 
is often defined as an extension of probability theory. Indeed, a 
probability distribution is a belief function whose focal 
elements are singletons. The advantage of the DST is that it 
provides important tools to handle both random and epistemic 
uncertainty. It is based on two dual non-additive measures, i.e. 
belief measure and plausible measure and it assigns a mass to 
every subset of FoD. In the following, we define the different 
notions. 


Definition 3. A mass assignment m is mapping function from 
P(X) to [0,1]. It satisfies the following conditions: 


a) m(@)=0, (2) 
b) Yacpcxym(A) = 1 (3) 


The value m(A) expresses the degree of support of the 
evidential claim that the true alternative is in the set A, but not 
in any specific subset. Any additional evidence, supporting the 
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claim that the true alternative is in a subset of A, let’s be B C A, 
must be expressed by another nonzero value m(B). Condition 
(b) of the equation 3, is called a normalization condition of 
DST. A subset A is called a focal element if m(A) > 0. F = 
{A © P(X)| m(A) > 0} is the set of focal elements. 


The belief function can then be deduced from m as: 
Bel(A) = »'B/BCA m(B) (4) 
The plausibility function also can be deduced from m as: 
PI(A) = Ye/pnazo M(B) (5) 


The pignistic probability BetP,, called as such by Smets [40], 
corresponds to a classical probabilistic transformation of a 
belief function: 


BetP,,(A) = Yiecx m®) 


|ANB|, VAGX 
iB| 


(6) 


Where |A| is the cardinality of A. If A reduces to a singleton 
{x}: BetPy (x) = Dees 5 


|B| 
Definition 4. A piece of evidence is an information that 
supports different hypotheses with different probabilities. It can 
contain variety of uncertainties due to the diversity of the 
information sources. 


Definition 5. A body of evidence (BoE), which is also called 
basic probability assignment (BPA) or basic belief assignment 
(BBA), is defined as the focal sets and their corresponding mass 
functions: 


BoE = {< A;, m(A;) >:A; E F,m(A;) > O}iery (7) 


Where f = |F| is the cardinality of F, called also the number 
of focal elements. 


In DST, given two pieces of information, represented in the 
form of two different bodies of evidence, Dempster's 
combination rule for combining them, is defined as follows: 


Dans=c M4 (A)m2(B) 
1 — Yans=0™1(A)m2(B) 
3) Fuzzy evidence theory 


m(C) = (8) 


DST is a powerful and flexible mathematical tool for 
handling uncertainty, impreciseness, and incomplete 
information. Even though, it represents appropriately non- 
specificity and discord, it exists some types of uncertainty that 
cannot be represented: for instance, fuzziness. Fuzzy evidence 
theory [6] is built to solve this problem. In fact, it combines the 
concepts of DST with fuzzy sets in order to represent the three 
types of uncertainty within one framework (fig. 2). 


Definition 6. A fuzzy mass assignment m is mapping function 
from P(X) to [0,1]. It satisfies the following conditions [12]: 


a) The set of focal element F = {A € P(X), m(A) > O} is 
finite. 

b) m@)=0, 

c) Yacm(A) =1 


A fuzzy subset A is called a focal element if m(A) > 0. 
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Fig. 2. Extended Klir et al.’s typology, adopted from [11] 


Definition 7. In the framework of the fuzzy evidence theory, a 
piece of evidence is characterized by a portion of weight called 
mass function m. The generation of the mass, with taking into 
account uncertainties, is an essential step to generate a fuzzy 
evidence number. A Fuzzy evidence number is assumed to a 
fuzzy subset to which is granted a mass. 


Definition 8. A fuzzy body of evidence (BoE) [8], known as a 
fuzzy basic probability assignment (FBPA) or fuzzy basic 
belief assignment (FBBA) [13-15], is defined as the fuzzy focal 
sets and their corresponding mass functions: 


FBoE = {< A;,m(A;), ug, >: A; € P(X), 
m(A;) > O}inny (9) 


Where f = |F| is the number of focal elements. 
C. Uncertainty measures 


For Harmanec [16], “measuring uncertainty or information 
means assigning a number or value from some ordinal scale to 
a given model of an epistemic state”. 


1) Non-specificity 


Non-specificity exists when numerous alternatives exist in 
a given set A. In a fuzzy set is obligatory linked to fuzziness 
since based on the fuzzy cardinality. It also exists in evidence 
theory, since the BPAs are defined over the crisp subsets of X. 
The Hartley measure is proposed in [15] that quantify the non- 
specificity by the formulation: 
(A) = log, A] (10) 
where |A| is the cardinality of A. 


A natural generalization of the Hartley measure of non- 
specificity to the fuzzy-set interpretation of possibility theory 
was developed by Higashi and Klir [17] under the name U- 
uncertainty: 


(11) 


Dubois and Prade [18] proposed another generalization of 
the non-specificity measure in Dempster—Shafer’s theory: 


N(m) = AE P(X) m(A) log, |A| (12) 


Many other authors gave other generalization types of 
Hartley measure as in [19] by Abellan and Moral and by Klir 
and Yuan in [20]. 


U(A) = J, log,| “Al da 


777 


2) Discord 


Discord represents a feature that expresses the fact that 
conflictual alternatives are considered as potentially occurring, 
and commonly represented by an additional measure. The basic 
measure of discord has been established by Shannon in the 
probability theory [21]: 


S(p) = — Lxex P(X) log, p(x) 
Where p is a probability distribution on X. 


(13) 


This measure has been used as the starting point for many 
theories to quantify uncertainty in situations where the 
probabilistic representation is inadequate. 


3) Fuzziness 


Fuzziness is the type of uncertainty represented by fuzzy 
sets theory, which is clearly distinct from discord. Two main 
approaches exist for measuring fuzziness, namely either 
“entropy-like” measures when the membership function is 
related to a probability distribution [22], [23], [24], [25], or 
“non-specificity-like” measures when an extension to the 
classical measure of cardinality is involved [26], [27]. 


The degenerated measure of fuzziness proposed by De 
Luca and Termini, [25] and called the entropy of a fuzzy set is 
given by the following equation: 

Fpre(A) = —K Yxealu(x) log, u(x) + (A - 
u(x)) log, (1 — u(x)) 


where K is a normalizing constant and A is the fuzzy set. 


(14) 


A wide literature survey of the different measures of 
fuzziness is presented in [28], where 15 measures are reviewed. 


4) Ambiguity 


Following Klir and Yuan [1], ambiguity is the sum of non- 
specificity and discord. It is called the total uncertainty in a 
BPA. Other terms are used, such as total uncertainty, aggregate 
uncertainty or general uncertainty to designate this type of 
uncertainty. A measure of ambiguity AM is proposed by 
Jousselme et al [4]: 


AM (m) = — Yixex BetPn(x) log, BetPn(x) (15) 


where BetP,, (A) = Dacre |ANB|, VAEX 

Shahpari and Seyedin presented a modified version of AM 
named Modified Ambiguity Measure (MAM) [29] to resolve 
the issues raised in [30] about the subadditivity property of AM. 
Abellan and Bossé [31, 32] have analyzed the drawbacks of 
these measures defined around the pignistic transformation and 
belief intervals. 


5) Imprecision 


Imprecision is the total uncertainty of a fuzzy set that 
accounts for both fuzziness and non-specificity. We name this 
measure, IM, standing for imprecision measure. This measure 
was slightly discussed in Liu [2] and with the authors of [7]. IM 
is composed of two parts as in Eq.16: 1) the non-specificity 
generally quantified by H, the Hartley measure, and; 2) the 
fuzziness quantified by Fpre. 
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Na 


IM(A) = a 


[Fore(A) + U(A)] 
where Fpre(A) is the entropy of a fuzzy set, U(A) is U- 
uncertainty representing the Hartley measure of a fuzzy set A 
and N the number of hypothesis in the frame of discernment. 


(16) 


6) Fuzzy randomness 


Fuzzy randomness is the proposed term to designate the 
total uncertainty of a fuzzy probability measure, since it 
contains both fuzziness and randomness. The first term 
concerns the prediction of the result, i.e., the event that will 
occur, and the second term is related to the interpretation of the 
result as 1 or 0. Fuzzy randomness can be modeled by a fuzzy 
random (stochastic) variable, which is a mathematical 
description of a fuzzy stochastic phenomenon. The first 
measure of uncertainty in fuzzy probability theory was 
proposed by Zadeh [33] as the weighted entropy for quantifying 
uncertainty. De Luca and Termini in [25] proposed a measure 
of uncertainty in the fuzzy probability theory framework that 
can represent the fuzzy randomness with the following 
formulation: 


His! (B) = — Yeex (P(x) log, p(x) + 
p(x) [u(x) log, w(x) + (1 — u(x) log.(1 — n(x))]| 


where p is a fuzzy probability distribution on X and p is the 
membership degree of each element p to X. 


(17) 


D. Total uncertainty measure 


To evaluate the performance of fusion systems, it is 
necessary to evaluate the information that processes it, i.e. to 
quantify the different types of uncertainty related to this 
information. GM is a measure proposed by Liu [2] in the 
framework of the fuzzy evidence theory, to gather the different 
types of uncertainty. 

Definition 10. For a given FBoE = {A;,m(A;), 1a,(x)}. the 
formulation of GM is : 


GM(FBoE) = — 8) 
—YVirex [BetP (x) log, BetP(x) + BetP(x) log, BetP(x)] 
Where : ; 
BetP(x,) = ae (19) 
nee (20) 


aa Yxresy Hao) 


Here Sj is the Support Sq = {x € X, ug(x) > 0} and fis the 
number of focal elements in the FBoE. 


E. Basic Scenarios to study the behavior of GM 


This section attempts to illustrate the behavior of GM which 
is an aggregate measure including all kinds of uncertainty. By 
varying these kinds of uncertainty using three basic operations 
on a fuzzy BPA: 

1) Defuzzification: this operation gives more precision 
to the information. It transforms a fuzzy BPA into a 
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crisp one. When it’s applied to a fuzzy set, it gives a 
crisp set and when it’s applied to a fuzzy probability 
distribution, defuzzification gives a _ classical 
probability distribution. 

Specification: this operation removes the ambiguity 
part. It transforms a fuzzy BPA into a fuzzy 
probability distribution. When it’s applied to a fuzzy 
set, specification gives a nonspecific fuzzy set, while 
applied to a crisp set, specification gives a singleton. 
Accordance: this operation reduces randomness. It 
transforms a fuzzy BPA into a fuzzy set. When it is 
applied to a fuzzy probability distribution, accordance 
gives a nonspecific fuzzy set, while applied to a 
classical probability distribution accordance gives a 
singleton. 

GM 


2) 


3) 


Deffuzzification GM __Deffuzzification 


a2mUps003¥ 


Deffuzzification 


Fig. 3 Schemes to vary the amount and type of uncertainty to study the 
GM behavior 


Six ways to vary the amount and type of uncertainty on a 
fuzzy BPA, using different combinations of these three 
operations. Hence, we can see the behavior of GM according to 
these variations. 


F. GM behavior 


Each time one of the operations, mentioned above, is 
applied to a FBPA, GM must behave accordingly. For instance, 
if we reduce a type of uncertainty the GM measure must 
decrease. Fig.3 shows the six ways to vary the uncertainty and 
the associated measures generated along the path. For instance, 
there are six (6) different ways to make GM decreasing. GM 
will correspond to the different measures quantifying the 
remaining types of uncertainty as we progress along the path 
such as AM, IM, HE, Fore, S or H. 
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For example, in Fig. 3(a), the defuzzification operation is 
first applied to the initial fuzzy BPA, which leads to a crisp 
BPA. Hence, the framework of fuzzy evidence theory turns into 
evidence theory framework and the uncertainty measure is then 
be quantified by AM. Then the crisp BPA that includes both 
discord and non-specificity is accorded, leading to a single 
classical set, whose uncertainty is the Hartley measure H. 
Finally, this classical set is specified, leading to a singleton, 
thus a null uncertainty. And so on for all the other five figures 
for GM. 


II. A FUZZY EVIDENCE NUMBER GENERATOR 


We need simulations to study the behavior of GM. To this 
end, fuzzy bodies of evidence (FBPA) have to be generated to 
implement the schemes depicted in Fig, 3 for the computation 
of GM. Firstly, we have to generate, on the frame of 
discernment X, a collection F = {Aj,...,A;, seyAp) of fuzzy 
numbers of evidence that are fuzzy subsets. These sets define 
the focal elements of a FBPA. To each focal element, a mass 
function is associated. We generate the mass function as in 
Burkov [8] presented in Algorithms | &2 : 


Algorithm 1 
1. input: P, the set of size f; 
2. rest €1; 
3. Fori€1tof—1 
4. do generate an exponentially distributed random 


value y; 


5. my(Ai) € PY <y).rests 

6. rest € rest — mx (Ai); 

7. m(Ap)€ rest; 

8. return {m(Ai)},- 
Algorithm 2 


1. input: X, the frame of discernement; 

type =' trapezoidal’; 

fun, =' quadratic’, fun, =' quadratic’; 

Select four random (uniformaly distributed) points A, B, 
C and D such that infy < A<B<C<D<supy; 
define f, according to fun, and values of A, B; 

define f, according to fun, and values of C, D; 

Define A as (A, fy B,C, fy D). 

return A 


on nn 


IV. EVALUATION OF GM 


The generated FBPA is represent by two components: 

A vector m that contains the masses of all focal elements, 
A matrix FB that contains the membership degrees of 
elements in each focal element of the FBPA. 

One focal element is a fuzzy subset of FoD with non-zero mass 
values. 


411... Min 

21 .. Won r 
FB=|:: | and m=[m, mj... my] 

py HEN 


where fis the number of focal elements and N is the number of 
elements in a FoD. 

An initial fuzzy BPA, say FBPAg, is first randomly 
generated using a uniform distribution between 0 and 1 for the 
matrix FB and the vector m. This latter must satisfy moreover 
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the normalization condition of DST for a BPA. Then, the 
uncertainty in FBPAg is successively reduced using the three 
basic operations described above. 


| c=ct] 
wT Calculate the uncertainty of FBPA (step2) 


No 


6 ways of reducing uncertainty 


GM-0 


yes 


No 


[> rit 


Fig.4 Proposed approach to compute GM 

Technically, to perform defuzzification, we change step by 
step the degrees of membership of FB, whose greater than 0.5 
rising to 1 and whose lower than 0.5 decreasing to 0. To 
perform accordance, we change step by step the values in m, 
the maximum rising to 1 and all the other values decreasing to 
0. And the specification is performed by randomly pruning 
elements of FB (i.e. changing them for 0). After each basic 
operation, GM is computed using the corresponding consistent 
expression (see Fig.4). 

Fig. 5 shows the result of a Monte Carlo simulation using 
1000 runs and a reducing scheme of uncertainty according to 
six different ways for the 1000 randomly selected fuzzy BPAs: 


1) Defuzzification, Accordance, Specification in Fig. 5(a); 
2) Defuzzification, Specification, Accordance in Fig. 5(b); 
3) Accordance, Defuzzification, Specification in Fig. 5(c); 
4) Accordance, Specification, Defuzzification in Fig. 5(d); 
5) Specification, Defuzzification, Accordance in Fig. 5(e); 
6) Specification, Accordance, Defuzzification in Fig. 5(f). 
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Fig. 5 GM computation following 6 combinations of operations: 
defuzzification (D), accordance (A) and specification (S). 


The results of Fig.5 from simulations show that GM 
behaves as expected according to the variation of amount and 
type of uncertainty: GM decreases in a monotonous way when 
we decrease the amount of uncertainty and obviously increases 
as we increase uncertainty. The method to generate FBPAs is 
being validated so that it results in a simulation tool that is 
appropriate to evaluate and select approaches such as fusion 
tules that are being used to reduce uncertainty. 

V. CONCLUSION 


The problem of measuring uncertainty within the general 
framework of fuzzy evidence theory has been discussed. This 
paper contributed with a simulation tool to support studies on 
the three main types of uncertainty and their associated 
measures: nonspecificity, fuzziness and discord. The tool that 
is centralized around the simulation of FBPAs can be used to 
evaluate the performance of systems and techniques that have 
the objective of reducing uncertainty. For instance, it can be 
used to evaluate and select fusion rules that are framed in a 
general theory of uncertainty such as fuzzy evidence. 
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Abstract—The theory of evidence has been largely used for 
many applications. This theory is a generalization of proba- 
bility distribution and offers a mathematical representation for 
two types of uncertainty-based information: discord and non- 
specificity. Several measures have already been developed to 
quantify these two types of uncertainty. They have been called 
total uncertainty measures since they quantify both types of 
uncertainty. The generalized Hartley measure and the maximum 
entropy have been the only measures so far that satisfy a list of 
properties very desirable for practical applications. Recently, two 
new measures of non-specificity and total uncertainty based on 
belief intervals have been proposed. These two measures do not 
satisfy the properties of additivity, superadditivity and subaddi- 
tivity in the theory of evidence. The present critique is about these 
shortcomings and provides a more complete analysis of those 
uncertainty measures with respect to a list of desired properties. 
A potential consequence of an ill-characterized measure may 
yield selecting an inappropriate rule for decision-making in the 
processing chain from data to information to decisions. 


Keywords: Imprecise probabilities, theory of evidence, uncer- 
tainty measures, non-specificity, additivity, subadditivity. 


I. INTRODUCTION 


The main goal of information is to enable adequate deci- 
sions and actions. Uncertainty and information are two sides 
of the same coin. Uncertainty-based information is a major 
dimension of information quality that is paramount to decision 
quality. The representation of uncertainty is a crucial issue 
in applications in many areas of science and engineering that 
support the transformation of information along the processing 
chain: data to information to knowledge to decisions and 
actions. 

Classical set theory and probability theory (PT) have been 
regarded as reference frameworks for centuries to represent 
uncertainty. However, these two frameworks cannot easily 
represent all types of uncertainty. Numerous other theories 
have been developed by expanding the conceptual frameworks 
on which those classical theories are based. 

Amongst them, the theory of evidence (TE), also known as 
Dempster-Shafer’s theory [8], [32], has been presented as an 
important extension of the classical probability theory (PT). In 
TE, the available information is represented via a new concept 
called basic probability assignment (BPA), which is introduced 
to generalize the probability distribution concept of PT. This 
theory has been widely used in several areas and applications 
such as in [34], [35], [25], [41], [13], [44], [19]. 
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Characterizing information is a crucial step in the develop- 
ment of applications framed within evidential theory as well 
as within any other uncertainty representation framework. The 
development of measures of uncertainty is part of that step that 
impacts on the subsequent calculus processes such as updating 
(conditioning), aggregation, combination, and decision rules. 
If a property of a measure is ill-defined or misinterpreted 
then it could mislead the rest of the processing chain. As an 
example, a non-additive measure that is characterized as being 
additive may be combined with an improper rule [36]. The 
main motivation behind the critique presented in this paper is 
to insist upon the need to best characterize the information and 
its imperfections upstream of the processing chain to prevent 
misinterpretation at decision level. 

Uncertainty measures within TE have been exploited suc- 
cessfully in several applications namely in pattern classifica- 
tion [2], [5], [27], [28], [29] to reference a few. 

In TE, more types of uncertainty can be represented by a 
BPA than in PT [23]. Yager [38] makes the distinction between 
two types of uncertainty called: discord (or randomness or 
conflict) and non-specificity. The first one has been related to 
entropy and the second to imprecision. 

The majority of measures presented in the literature have 
Shannon’s entropy [33] as a starting point. The motivation 
has been because Shannon’s entropy satisfies a large set of 
properties in PT. For example, the maximum entropy measure 
in [15] satisfies a similar list of properties in TE than in PT. 
Hence, it has been considered as the best established measure 
in TE quantifying jointly both types of uncertainty [4], [5], 
[6]. This type of measures is called total uncertainty measures 
(TU). Several other measures have been recently proposed to 
quantify, jointly or separately, both types of uncertainty [31], 
[9], [7], [39], [40], [43] within the TE framework. 

Non-specificity is associated with cases where the informa- 
tion is focused on sets with cardinality greater than one. A 
non-specificity measure is then based on the way to quantify 
imprecision in a BPA. This type of uncertainffty does not 
appear in the PT and can be considered as a major difference 
between PT and other theories that claim to generalize it. The 
majority of these theories are based on imprecise probabilities 
[42], [21]. 

A measure of non-specificity must satisfy a list of properties 
[12], [23]. So far, only the called generalized Hartley measure 
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satisfies all the required properties for a non-specificity mea- 
sure in TE [21]. However, there is no study in the literature 
that establishes the relative importance of that list of properties 
and its exhaustivity for the use of that measure in practical 
applications. 

Very recently, Yang et al. [39] have presented a new measure 
of non-specificity in TE. This measure is based on beliefs 
intervals that a BPA represents about single elements of a 
finite set. The authors make simplifications of that measure to 
facilitate its use in applications. In addition, the authors claim 
that the new measure satisfies the following list of properties: 
range, monotonicity, symmetry, additivity and subadditivity. 

By the means of a counter-example, this paper shows that 
the properties of additivity and subadditivity are not always 
satisfied with the proposed measure. In their mathematical 
derivations, the authors of [39] have not taken into account 
some considerations related to the cardinal of the focal sets. 
The contribution offered by our critique here is to provide a 
more complete analysis of the listed properties and to point 
out where the shortcomings are. 

Another contribution of this paper is to complete the analy- 
sis of the Total Uncertainty (TU) measure presented by Yang 
and Han [40] based on belief intervals. This measure uses 
the distance between intervals numbers, presented in [18], 
[37] and applied to belief intervals of an evidence. This TU 
measure satisfies some properties in the list, but additivity and 
subbaditivity properties have not been discussed. This paper 
completes the discussion by showing that TU does not satisfy 
both additivity and subbaditivity properties. 

In theory of evidence (TE), the complexity of a problem 
can be reduced by using the principle of decomposition. This 
is achieved through projections of an evidence. However, for 
those measures based on this decomposition, it can exhibit a 
very conflicting behavior: sometimes an increase of informa- 
tion and sometimes a decrease in information. This behavior 
makes no sense when the same functional projection has been 
used. 

The paper is organized as follows. Section 2 reviews briefly 
the necessary background about theory of evidence, uncer- 
tainty measures and the definitions of some of their desired 
properties. Section 3 presents an analysis of the additivity 
and subadditivity of the new TU measures based on belief 
intervals. Finally, section 4 presents conclusions. 


IJ. BRIEF BACKGROUND 
A. The theory of evidence 


The theory of evidence (TE) [8], [32], is a type of mathemat- 
ical theory based on imprecise probabilities (see Walley [42]). 
Its principal characteristics and concepts can be described as 
follows. 

Let X be a finite set, considered as a set of possible 
situations, |X| = n, o(X) the power set of X and x any 
element in X. Evidential theory used the concept of a basic 
probability assignment (BPA), also called mass assignment. A 
BPA is a mapping m : o(X) > [0,1], such that m() = 0 


and > m(A) =1.A set A such that m(A) > 0 is called 


ACX 
a focal element! of m. 

Let X,Y be finite sets. Considering the product space of 
possible situations X x Y and m a BPA on X x Y. The 
marginal BPA on X, mi* (and similarly on Y, mi*), is 
defined as follows:? 


S> m(R), VAC X, (1) 
R|A=Rx 


where Rx is the set projection of R on X. 
There are two functions associated with each BPA, a belief 
function, Bel; and a plausibility function, Pl: 


Bel(A)= S> m(B); PUA)= S> m(B). 


BCA ANB#0 


They can be seen as the belief bounds of A (lower and upper 
belief of the set A, respectively). 

We note that belief and plausibility are interrelated for all 
A€ e(X), by Pl(A) = 1 — Bel(A°), where A denotes the 
set complement of A. Furthermore, Bel(A) < Pl(A). Hence, 
the interval [Bel(A), Pl(A)] is called the belief interval for 
the set A. 


B. Measures of uncertainty in the theory of evidence (TE) 


Shannon [33] presented a measure of entropy on probability 
theory (PT) defined as follows: 


S(p) = — S5 p(x) log, (p(a)), (2) 


cEeX 


where p = (p(x))zex is a probability distribution on X, a 
finite set; p(x) is the probability of the value x. Here, log, is 
normally used to measure the value in bits. S(p) measures the 
only type of uncertainty that can be represented in probability 
theory and it satisfies a list of desirable properties [33], [23]. 

There exist two types of uncertainty in evidence theory 
(Yager [38]): “one associated with cases where the information 
is focused on sets with empty intersections; and one associated 
with cases where the information is focused on sets with 
cardinality greater than one”. The first concept is known as 
discord (also as randomness or conflict); and the second one 
is known as_ non-specificity. 

A significant effort has been allocated to quantify the degree 
of discord in evidence theory [23]. In this paper, the discussion 
will be focussed on measures of non-specificity and total 
uncertainty within the framework of TE. 

Dubois and Prade [10] have introduced in TE a function 
based on the Hartley measure [14], which was defined in the 
classical set theory. It represents a measure of non-specificity 
associated with a BPA and it is expressed as follows: 


'The focal elements can be noted as A € o(X) or A C X, with 
m/(A) > 0. The empty set is never considered here because always m(@) = 


?The expression of the marginal BPAs from Eq. (21) in the paper of Yang 
et al. [39] has an erratum: the set “R” must be “S”. 
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I(m) = $7 m(A) log. (|Al). (3) 


ACX 


I(m) attains its minimum value (zero) when m is a proba- 
bility distribution. Its maximum value (log, (|X|)), is obtained 
for a BPA, m, with m(X) = 1 and m(A) = 0, VA Cc X. 
I satisfies a large set of desired properties [10], [23]. In the 
literature, to our knowledge, there is no other measure of non- 
specificity that satisfies a similar list of properties. 

Several composed measures have appeared within the theory 
of evidence to jointly quantify both parts of uncertainty. 
They are called total uncertainty measures (TU) in TE. The 
most established one seems to be the measure proposed by 
Harmanec and Klir [15], [16]: S*(m) equal to the maximum 
of the entropy (upper entropy) of the probability distributions 
satisfying Bel(A) < >> p(x) < PI(A), VA C X. This 

eA 


measure quantifies discord and non-specificity but it does 
not exhibit the respective separate parts. Some years later, 
Abellan, Klir and Moral [4], have proposed upper entropy as 
an aggregate measure framed in more general theories than 
TE, coherently separating discord and non-specificity. Similar 
splitting can be done within TE: 


S*(m) = S.(m) + (S* — S.)(m), (4) 


where S*(m) represents maximum entropy and S.,(m) repre- 
sents minimum entropy on the credal set? K,,, associated with 
a BPA m, which can be defined the following way: 


Km = {p| Bel(A) < p(A), VA € 9(X)} (5) 


Here S,(m) measures respectively the discord part and (S* — 
S,,)(m) the non-specificity part of the BPA m [3], [6]. 


C. Additivity and subadditivity of non-specificity measures in 
the theory of evidence 
Klir and Wierman [23] define a list of desired properties for 

an uncertainty measure in TE. The additivity and subadditivity 

properties belong to that list. A measure of uncertainty (MU), 

can then be defined as follows [7]: 
Additivity: “Let m be a BPA on the space X x Y, m‘* 
and m+ its marginal BPAs on X and Y respectively 
such that these marginals are not interactive (m(A x B) = 
m'*(A)m'*(B), with A CX, B CY and m(C) =0 
if C # Ax B). Then MU satisfies the additivity property 
iff it satisfies the equality: 


MU(m) = MU(m!*) + MU(m‘")”. (6) 


The same property can be expressed in a reverse manner. 
If we build a BPA m on X x Y from two independent 
BPAs on X and Y, m, and mz respectively (m = mz, - 
mz), the total amount of information should be preserved. 
In this case the marginals of m on X and Y, are m, and 
mg respectively (the marginals are not interactive). 


3A closed and convex set of probability distributions [4]. 


Subadditivity: “Let m be a BPA on the space X x Y, 
m** and m** its marginal BPAs on X and Y respec- 
tively. Then MU satisfies the subadditivity property iff it 
satisfies the inequality: 


MU(m) < MU(m**) + MU(m!*)”. (7) 


The expression indicates that the amount of information 
must not be increased through a disaggregation process 
to respect the principle of information conservation. 


Additivity and subadditivity represent important properties 
that a measure of non-specificity must satisfy in TE [12]. 

To further analyze the meaning of these properties, let use 
examine the following practical example. 


Example 1: Three riders and two horses are participating 
agents of an obstacle race. The competition is that each rider 
with each horse performs a circuit of obstacles in the shortest 
possible time. The final score depends on the time taken and 
the number of overturned obstacles. There are 3 different 
competitions as follows: (i) the best binomial set (rider + 
horse); (ii) the best rider; and (iii) the best horse. One can bet 
money on each type of competition, but the greatest reward 
is for type (i) since there are more alternatives and it is more 
difficult to win. The reward for (i) can be considered as an 
aggregation value of the rewards for (ii) and (iii). 


Two cases are analyzed. Case “D” where there is a priori 
knowledge about riders and horses that can provide an 
advantage to win the race. Pairing knowledge between riders 
and horses does exist (no independence between rider and 
horse). The second case denoted as “I” represents the case 
where riders do not possess any knowledge about any horse. 


In Case D, experts can assign a numerical value to a 
binomial set (rider + horse) that would be the winning one, 
as given: |) the circuit done with different types of obstacles, 
and 2) the experts’ a priori knowledge obtained from past 
competitions. In Case I, there are horse experts and rider 
experts. There is no cross-knowledge (pairing knowledge) 
and experts are consulted separately on horses and on riders. 


For both cases, a set of possible riders is defined as 
RI = {ri,r2,r3}; a set of horses as H = {hi,h2}; and a 
set of binomials as B = {b;;|¢ = 1,2,3;7 = 1,2}, where 5;; 
expresses the binomial r; +h; and B= RI x H. 


All experts use belief functions to express their knowledge 
on riders, horses and binomial sets. 


These two cases, Case D and Case I, are analyzed below 
with respect to the additivity and subadditivity properties. 


- Case I. The BPA mp, is given by an expert on riders 
while BPA my is from an expert on horses. Each 
quantified knowledge is independent on each other, i.e. if 
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we build m = mpy-my, the marginals are not interactive 
since mrp = mF! and my = m*". 

The information obtained from m, built from the 
marginal BPAs, is used to bet on the binomial sets. 
The product space creates more alternatives, then the 
uncertainty must be greater (or at least not less) than 
with each marginal taken separately. The question raised 
here is the following: is it coherent that the sum of 
information from the original independent BPAs be 
less or greater than the one of the joint BPA via an 
information based-uncertainty measure? 


Let then examine the following situation: (a) a bet on 
the binomial sets OR (b) bets on riders and horses 
separately. Here, (a) and (b) are exclusive. Rewards for 
(a) and (b) are equivalent according to the previous 
definitions. Our common sense would say to choose 
(a) or (b) with the same level of credibility than 
obtained with the available information. This is what the 
definition of the additivity property means. A joint case 
is being built from independent sources so as a result 
there will be neither more neither less information: the 
available amount of information must be preserved. 
Consequently, the additivity property is then an essential 
property for a measure of information (uncertainty) in TE. 


Case D. An expert expresses his knowledge via the BPA 
m on B = RI x H, and we want to make the three 
possible bets. To this end, we use: 1) the marginal BPAs 
mtFl om and, 2) an uncertainty measure UM. Let 
then examine the following statement: 


UM(m) > UM (mi!) + UM(m*") 


The expression above would mean that via the measure 
UM, one can gain information with a simple math- 
ematical procedure on BPAs. Moreover, if we build 
the BPA* m’ mF. mt on B, it could result 
in providing more information via UM than obtained 
from the original expert on binomial sets. That means 
improving the original information source. Based upon 
the above additivity property, to be coherent, we can have 
the following situation: 


UM(m) > UM(m**!) + UM(m*") = UM(m’) 


In this case, it results in having more information to the 
binomial sets than the one obtained from the original 
expert (!). The subadditivity property indicates, in accor- 
dance with our common sense, that is not coherent. The 
amount of information must not be increased with only 
calculations of the marginal BPAs. 


4m #m’ might be possible. 


III. MEASURES OF UNCERTAINTY BASED ON BELIEF 
INTERVALS 


A. A measure of non-specificity based on belief intervals 


Very recently, Yang et al. [39] have presented a new non- 
specificity measure in TE based on belief intervals. This 
measure takes into account the maximum difference of belief 
of each possible state of a finite set X. If we consider a 
BPA m on a finite set X with states {21,---,2,}, the 
measure is defined using the values Pl({x;}) — Bel({x;}), 
i € {1,---,n}. The measure is the average of those values 
on the belief intervals, and can be expressed as follows: 


NE?! (m) = ~ S>(PUfx:}) — Bel({x;})). (8) 


a 


This definition makes sense in relation to the non-specificity 
concept and its coherence. Non-specificity is focused on the 
degree of imprecision of a BPA. Then, it is related to the values 
of the belief intervals used in the definition of the measure. 

Yang et al. [39] have shown that N E®! can be reduced? to 
the following expression: 


lal 


NE? (m) = (9) 


bs 


ACX,|A|>1 


m(A) 


1) Properties : The measure in Yang et al. [39], has been 
shown to satisfy the properties of range, monotonicity and 
symmetry. These three properties will not be discussed here. 
In addition, Yang et al. have shown that their measure satisfies 
a multiplicativity and a submultiplicativity property. These 
properties are equivalent to the additivity and subadditivity 
properties®, as it is remarked in Yang et al. [39]. These 
properties can be described in a similar way as for additivity 
and subadditivity. 


Multiplicativity: “Let m be a BPA on the space X x Y, 
m** and m** its marginal BPAs on X and Y re- 
spectively such that these marginals are not interactive 
(m(A x B) = m!*(A)m!‘*(B), with AC X,BCY 
and m(C) = 0 if C# Ax B). Then NE¥? satisfies the 
equality: 


NE?! (m) = NE? (m!*)- NEP (m*¥)”. (10) 


Submultiplicativity: “Let m be a BPA on the space 
X x Y, m!* and m!” its marginal BPAs on X and 
Y respectively. Then NE?! satisfies the inequality: 


5The expression of the summations from Eg. (15) in the paper of Yang 
et al. [39] brings some sort of confusion. For example, to express the value of 
the PI({O:}), ofp icy MLG:, 9; })is used ; and for Pl({9n }) one could 
interpret that the summation does not include any term, because there is no 
j > n. One should use 37" 1.;.; m({0:, 9; }) instead, which represents the 
correct form. 

6They represent their counter-part definitions [12]. The additivity and 
subadditivity are used when the measure has a range of [0, log(n)], whereas 
the multiplicativity and submultiplicativity with a range of [0,1]; but they 
represent the same concept. 
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NE®! (m) < NE? (m‘*)- NEP (mY). (11) 


At this point, we would like to cite Yang et al. [39] about 
the importance of these two properties: 

*Note that the physical meaning of submultiplicativity is 
essential in the conservation of information, i.e., the amount 
of uncertainty in a joint BPA is no greater than the total 
amount of uncertainty of its corresponding marginal BPAs. 
The equation holds if and only if the corresponding marginal 
BPAs are independent, i.e., there is not correlated part. If 
two marginal BPAs are dependent, then the double counting 
uncertainty amount should be removed, therefore, the total 
amount of uncertainty in the joint BPA is larger than the total 
amount in marginal BPAs.” 


Let then consider the following example based on the 
Example 1: 

Example 2: Using the case expressed in Example 1. Let 
RI x H be the product space of the sets RI = {r1,r2,r3} 
and H = {hy,h2}, and m+ and m? the following BPAs on 
RI and H expressed by two experts on riders and horses, 
respectively: 


m'({r1,r2}) = 0.45, m'({r3}) = 0.45, m1 (RI) = 0.1; 
m?(H) = 1. 


Hence, the BPA m = m! xm? on RI x H has the following 
masses: 


m({b11, bi2, bai, b22}) = 0.45, m({b31, b32}) = 0.45, 
m(RI x H) = 0.1, 
where we note bj; = (ri,hj). Then m? = m*”! and m? = 
m4, and they are not interactive by definition. 
The values of uncertainty via NE?! measure are the 
following ones: 


NE?! (m) = 0.3 + 0.15 + 0.1 = 0.55; 
NEP! (m1)- NE?! (m?) = (0.3 +0.1)-1=0.4. 


Then NE?! (m') #4 NE®!(m!) - NE®!(m?), and the 
multiplicativity property is not satisfied by NE?!. 


The above Example 2 can also be used to prove that NEP! 
does not satisfy the submultiplicativity property. Let use the 
same BPA m on RI x H and then its marginal BPAs are m, 
and mz on RI and H, respectively. We have that: 


NE?! (m) > NEP (m"!). NEB (ms), 


and this implies that the submultiplicativity property is not 
satisfied for NEP!, 


The results above say that the new measure, as opposed of 
what has been said in Yang et al. [39], does not satisfy the 


multiplicativity and submultiplicativity properties. The main 
error in Yang et al. [39] mathematical proofs is located at the 
cardinal of the sets. For example, if A C RI and B C H, 
then A x B C RI x H and it is possible to find focal sets of 
each BPA such that, |A x B| > 1 and |A| = 1, |B] > 1, as it 
is happening in the Example 2. 

To analyze the proof about the multiplicativity, we apply 
the values of the Example 2 on the penultimate step in the 
Equation (23) of Yang et al. [39]. We can observe the following 
situation’, (detailed calculations of Example 2): 


[AILB| _ 


NS?! (m) =... = mx (A)my 
nx ny 


A,B;|Ax B|>1 


(B) 


= (0.45- ye + (0.45- yo aay (es pe 


A 2 3 
x(a = 0.455 + 0.15; 


NS®!(m!) = s - ; 


A;|A[>1 
NS®!(m?) a 


It is easy to see that 


A,B;|Ax B|>1 


A;|A[>1 


Hence, the penultimate step in Eq. (23) of Yang et al. [39] is 
shown to be incorrect. 

Focal elements with cardinal 1 in the set X, i.e. sets that 
do not produce any imprecision, can be components of sets 
in the product space X x Y that produce imprecision. This 
case has not been considered in the proof of multiplicativity 
in Yang et al. [39]. 


Now, let us look at the proof of the submultiplicativity 
property and here again there is a problem with the last step. 
That last step expresses the following equality: 


[RI 


NS?! (mt* x m*¥) = S- m(R) 
nx ny 


RCXxY;|Rx|>1,|Ry|>1 


But, this equality is not always correct. Again, if we consider 
the values of the Example 2, the left term of the above equality 
is N.S?!(m) and contains the following addend: 


7Here, X is R and Y is H to follow the notation of [39]. 
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NS?! (m) = “+ +-m({bs1,852})2 Sesce 


Now, considering R = {bs1,b32}, we have that Rr; = {x3} 
and |Rrr| = 1. Hence, that addend can not be in the right 
part of the equality. In this specific case, the equality is shown 
to be incorrect. 


That could lead us to think that this measure always 
produces an uncertainty decrease when using decomposition 
with the marginals in TE. That might not be a good operation 
for this measure and implies to add more coherence. The 
property associated with that situation is known as supermul- 
tiplicativity.® With the above notation, it can be defined, for 
a UM measure, as: 


UM(m) > UM(m!*)-UM(m!?"), (12) 


but with the example below we show that NE?! does not 
satisfy that property. 

Example 3: 

Let X xY be the product space of the sets X = {x1, 22,23} 
and Y = {y1, y2}, and m the following BPA on X x Y: 


m({Z11, 212, 221, 222, 231}) = 0.4, m({ 221, 222, 231, 232}) = 0.4, 


m(X x Y) = 0.2, 


where we note 2; = (i, y;). 
Now, the marginals on X and Y have the following values: 


m** ({x9,23}) = 0.4,m**(X) = 0.6 


mY (Y)=1 


The values of uncertainty via NEP! measure are the 
following ones: 
NEP! (m) = 0.8; 


NE?! (m'*). NEP (mY) = 0.866 - 1 = 0.866. 


Hence, NE?! (m) < NE®!(m!*) . NE®!(m**), and the 
supermultiplicativity property is not satisfied by NEP". 

The Examples 2 and 3 represents a very conflictive situation 
for the new measure of Yang et al. [39]. If we have a 
complex set of information represented within TE that can be 
decomposed using the marginals to make projections on two 
(2) less complex sets, then in some situations the information 
available can be decreased; and in others ones it can be 
increased. This is a very undesirable behaviour for such a 
measure. 


B. A measure of total uncertainty based on belief intervals 


Let m be a BPA on a finite set X with n elements 
{x1,-+- ,@p}. Each element has the following belief intervals 
[Bel({x;}), Pl({zi})], « © {1,--+,n}, that we simplify as 


8It is equivalent to the super-additivity property [11]. 


[Bel;, Pl,]. Yang and Han [40] have recently proposed the 
following measure of total uncertainty in TE: 


TU'(m) =1- “V3y, d' ([Bel;, Pl;]; [0, 1}) (13) 


where d’ is a distance function between intervals obtained from 
[18], [37], and it has the following expression: 


d’'([a,b1]; [a2, b2]) = 
(14) 


25 jo = artbe |” +4 1 [454 ao 


i tata)? 


1) Properties : Yang and Han [40] prove that TU’ measure 
has some desirable properties for a TU measure in TE: range, 
monotonicity, probability consistency and set consistency. 
Unfortunately they do not show whether this TU measure 
satisfies additivity and subadditivity properties; in this case, 
since the range of TU is [0,1], the equivalent properties of 
multiplicativity and submultiplicativity. These two properties 
will be verified below. 


The following Example 4 shows that TU’ measure does not 
satisfy the multiplicativity property. 

Example 4: Let X x Y be the product space of the sets 
X = {x1,X2,x3} and Y = {y1, yo}. Let the following BPAs 
m1 and m? be on X and Y respectively: 

m*({a1,22}) = 0.5, m" ({x3}) = 0.5; 
m?({yi}) = 0.5,m?(Y) = 0.5. 


We build the BPA m = m! x m? on X x Y (the marginals 
of m are not interactive). Then, m has the following values, 
where again we note 2; = (x, y;): 


m({Z11, zoi}) = 0.25, m({z11, 412, 221, Z22}) = 0.25, 
m({z31}) = 0.25, m({za1, 232}) = 0.25, 


Now, the set of belief intervals of each z;; are the following 
ones: 


{[0, 0.5]; [0, 0.25]; [0, 0.5]; [0, 0.25); [0.25, 0.5); [0, 0.25]} 


and we obtain: 
TU'(m) = 0.386 


Similarly, for m' and m? we have the following belief 
intervals, respectively: 


{[0, 0.5]; [0, 0.5]; (0.5, 0.5]}, 
{[0.5, 1]; (0, 0.5]}, 


and TU’ values: 
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Hence, we have that 


0.386 = TU'(m) 4 TU’(m') - TU'(m?) = 0.5- 0.5 = 0.25 


and the multiplicativity property is not satisfied. 


To prove that TU’ also does not satisfy the submultiplica- 
vility property we only need the Example 4. Considering the 
BPA m on X x Y, we have that its marginal BPAs are m! 
and m? on X and Y, respectively so that: 


0.386 = TU’(m) > TU’ (m**)-TU' (m!*) = 0.5-0.5 = 0.25 
implying that TU’ does not satisfy the submultiplicativity 
property. 


As happens with the NE?! measure, TU’ also does not 
satisfy the supermultiplivativity property, expressed in Eq. 
(12). To prove it, we only need to consider the Example 5. 

Example 5: 

Let X xY be the product space of the sets X = {x1, 22,23} 
and Y = {y1, yo}, and m the following BPA on X x Y: 

m({2Z11, 212, 221, }) = 0.6, m({z31}) = 0.1, 
m(X x Y) = 0.3, 


where we note 2; = (i, y;). 
Now the marginals on X and Y have the following values: 


mi* ({a1, t2}) = 0.6, m** ({23}) = 0.1, m!* (X) = 0.3 


m*¥ ({yr}) = 0.1, m*¥ (Y) = 0.9 


These evidences produces the following sets of belief intervals 
on X x Y, X and Y respectively: 


£[0, 0.9]; [0, 0.9]; [0, 0.9]; [0, 0.3]; [0.1, 0.4]; [0, 0.3]} 
{[0, 0.9]; [0, 0.9]; (0.1, 0.4]}, 
{[0.1, 1]; (0, 0.9]}, 


The values of uncertainty via TU’ measure are the following 
ones: 


TU'(m) = 0.624; 
TU'(m**) « TU'(m** ) = 0.748 - 0.9 = 0.673. 


Now TU'(m) <  TU'(m'*) - TU’(mt¥), and the 
supermultiplicativity property is not satisfied by TU’. 


We see that the new total uncertainty measure, TU’, in TE 
has a similar undesirable behaviour than the NE?! measure. 
In some situations the use of the projections produces an 
increase of information, and a decrease in other ones. This 
behaviour is incoherent. It could impact negatively a subse- 
quent decision-making process (not analyzed in this current 


paper). 


IV. CONCLUSIONS 


In this paper, we have analyzed the properties of additivity 
and subadditivity of new measures of uncertainty in TE based 
on belief intervals. The definitions of these measures makes 
sense and they satisfy a list of interesting and important 
properties but they have shortcomings: they do not satisfy the 
properties of additivity and subadditivity. These two properties 
belongs to a list of required properties for such types of 
measures in TE. It has been also shown that these new 
measures present incoherent results when a decomposition is 
done using the same functional projections: in some situation 
that decomposition presents an increase in information, but in 
others it presents a decrease of information. The importance 
of those shortcomings in real life applications cannot be 
appreciated without more analysis and extensive empirical 
studies. However, considering the results presented in this 
paper, more work is required to adjust those measures or 
to develop new ones that possess all required properties for 
exploitation along the complete processing chain from data to 
decisions and actions. 
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Abstract—In this paper we examine many existing measures 
of uncertainty (MoU) of basic belief assignments proposed in 
the literature related with the theory of belief functions. Some 
measures capture only a particular aspect of the uncertainty, 
others propose a total measure of uncertainty to characterize 
the information quality of a source of information. We discuss 
the effectiveness of these measures with respect to four main 
important desiderata that we consider essential for the definition 
of a satisfactory MoU (i.e. effective entropy of basic belief 
assignment). 


Keywords: Measure of Uncertainty, MoU, belief functions, 
Shannon entropy. 


I. INTRODUCTION 


In the classical framework of belief functions, a source of 
evidence expresses its belief on the possible solutions of a 
given problem defined with respect to a chosen (finite) frame 
of discernment (FoD) ©. This belief is usually characterized 
by a basic belief assignment (BBA), referred also as a belief 
mass denoted by m(-). One of the major concern related with 
belief function is how to measure/quantify the uncertainty 
encompassed by a source of evidence and inherent to any 
BBA. This problem is challenging and of crucial importance 
because its effective solution would allow to well characterize 
any BBA, to make fair comparisons of sources of evidence, 
to compare fusion results in term of uncertainty reduction, to 
achieve a BBA complexity reduction by new approximations 
methods, etc. 

In this paper we make a state-of-the-art survey of most 
of existing MoUs available in the literature, and point out 
their theoretical drawbacks to warn the reader about their 
misuses and irrelevances in applications. This work justifies 
the requirement for better effective MoUs to make a step-ahead 
in the understanding and characterization of uncertainty in 
the belief functions framework. There exist several survey pa- 
pers covering different proposals for measures of uncertainty, 
among them we must cite by chronological order [1]-[14], 
and more recently in [15], [16]. These papers however do not 
consider the effectiveness of MoU as we propose in this paper. 

In the sequel, we suppose the reader familiar with the 
classical (i.e. Shannon) information theory [17]-[22], and 
specially with Shannon entropy measure, and with the theory 
of belief functions introduced by Shafer in [23]. Some of 


these basics are recalled in appendix for convenience and for 
recalling the classical notations. 

This paper is organized as follows. In section II we present 
and justify the four essential desiderata that a MoU should 
satisfy in order to be considered as effective. In section II 
we examine many existing MoUs proposed in the literature 
over 40 years, and check if they pass the effectiveness test, or 
not. For those that pass successfully the test, we examine in 
details in section IV if they are sufficiently well justified for 
considering them as serious candidate for effective MoU to 
be used in applications. Section V concludes this survey and 
gives some perspectives for future research works. 


II. DESIDERATA FOR AN EFFECTIVE MOU 


Our analysis of many existing works on Measures of 
Uncertainty (MoU) of belief functions reveals that most of 
MoUs suffer of serious problems, and we explain why in 
the next section. Here we introduce several very essential 
desiderata that a satisfactory MoU, denoted by U(m), should 
satisfy. Some of these desiderata have already been identified 
in the past by some researchers working towards axiomatic 
approaches of MoUs, for instance by Klir [8] and Abelladn 
[12], [13], [15]. Here we keep only the four desiderata that 
we consider as really important and indispensable, and we 
justify our choice for these desiderata. We also explain why 
we consider the other desiderata not fundamental, and why we 
decide to discard them. The four essential and indispensable 
desiderata we consider for a satisfactory MoU are mathemat- 
ically expressed as follows 


e Desideratum D1: (zero min value of U(m)) 


U(m) =0 (1) 


if the BBA m defined on the power set 2° of the frame 
of discernment © is focused on a singleton, that is if 
m(X) = 1 for some X of 2° with |X| = 1. 


Justification of D1: This desideratum is very natural 
and intuitive because any particular BBA for which 
m(X) = 1 with |X| = 1 characterizes the certainty of a 
singleton X, which is one of most specific element of 2°. 
There is no uncertainty about the choice of this element 
X characterized by m(X) = 1 since this element X 
(a smallest information granule) does not include other 
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smaller element in it. So, the measure of uncertainty must 
be minimal, and it can always be arbitrarily set to zero 
reflecting well such non-uncertainty case. 


Desideratum D2: (increasing of MoU of vacuous BBA) 


U(m®) < U(m®), 


iC) 
v 


if || < |0’|. (2) 


where m, and mo are the vacuous BBAs defined 
respectively on the frames of discernment (FoDs) © and 
©’ of cardinalities |O| and |’). 


Justification of D2: This desideratum stipulates that the 
measure of uncertainty of a total ignorant source of 
evidence represented by the vacuous BBA must increase 
with the cardinality of the frame of discernment. This 
desideratum makes perfect sense because the total igno- 
rant source of evidence on O = {61,...,@n} for which 
m®(@) = 1 means that one knows absolutely nothing 
about only NV elements, whereas the total ignorant source 
of evidence on 0! = {61,...,0n,OAn41,---,On'} for 
which m® (Q’) = 1 means that one knows absolutely 
nothing about more elements because N’ > N. This 
clearly indicates that m° " must be considered in fact as 
more ignorant than me. and the condition (2) reflects this 
necessity. 


Desideratum D3: (compatibility with Shannon entropy) 


U(m) =— S> m(X)log(m(X)) (3) 
xXxEO 
if the BBA m(-) is a Bayesian BBA defined on the FoD 
©. We recall that any Bayesian BBA commits zero belief 
mass for all elements of the power set of O having their 
cardinality greater than one [23]. 


Justification of D3: This desideratum D3 seems also very 
natural because Shannon entropy is the most well-known 
(and justified [20], [24]-[27]) measure used so far to 
quantify the uncertainty (i.e. the randomness, or variabil- 
ity, also called conflict by some authors) of a probability 
mass function (pmf). Because any Bayesian BBA induces 
belief and plausibility functions that coincide with a 
probability measure, one must have a total coherence of 
U(m) with Shannon entropy when the BBA is Bayesian 
if one admits, as we do here, that Shannon entropy is 
an effective measure the uncertainty (or randomness) of 
a pmf. Under the acceptance of Shannon entropy as 
MoU for pmf, the desideratum D3 makes perfect sense. 
Of course, this desideratum D3 could be disputed (and 
eventually rejected) if one can cast in doubt (based on 
very strong justification) the use of Shannon entropy as 
MoU for pmf. For alternatives of Shannon entropy, see 
the non-exhaustive list of alternatives given in [28]-[30], 
and discussions in [9], [31]-[33] for instance. 

e Desideratum D4: (unicity of max value of U(m)) 

Yn £ my, 


U(m) < U(m,), (4) 


where m is any BBA different of the vacuous BBA ™, 
defined with respect to the same FoD. 


Justification of D4: This fourth desideratum is very 
important and it makes perfect sense also because the 
total ignorant source of evidence is characterized by the 
vacuous BBA m,(-), and no source of evidence can be 
more uncertain than the total ignorant source, so the 
unique maximum value of U(m) must be obtained for 
U(my,). As it will be shown next, many existing MoUs 
fail to satisfy this important and essential desideratum. 


Effectiveness of a measure of uncertainty: A measure of 
uncertainty U(m) is said effective if and only if it satisfies 
desiderata D1, D2, D3, and D4 and if it is strongly well 
justified. Any MoU that fails to satisfy at least one of these 
desiderata is said non-effective, and in this case it cannot be 
considered seriously as a satisfactory measure of uncertainty 
for characterizing a basic belief assignment of a source of 
evidence. Consequently, all non-effective MoUs should be 
discarded in all applications that necessitate some MoU eval- 
uation. 


Remark 1: It is worth noting that we do not specify a priori 
what should be the range of an effective MoU in contrary 
to some axiomatic attempts made by different authors as 
reported, for instance, in [15], [34], [35]. We consider that 
the choice of the range must not be chosen a priori. The max- 
imum range must result of the effective MoU mathematical 
definition. We only request the satisfaction of the desideratum 
D4, which is much more general, natural and essential. 


Remark 2: We voluntarily do not include the subadditivity 
desideratum in our list of our desiderata for the search of an 
effective MoU in the belief function framework because this 
desideratum appears in general (i.e. for non-Bayesian non- 
vacuous BBAs) to be incompatible with essential desideratum 
D4, and thus it is illusory and vain to ask for a sub-additive 
MoU for non-Bayesian non-vacuous BBAs. We recall that the 
subadditivity condition is defined by U(m®*®') < U(m*)+ 
U(m*®’) or any joint BBA defined on the cartesian product 
© x ©! of FoDs © and ©’, where m*®© is the marginal (i.e. 
projection) of m©*®'(-) on the power-set 2°, and m!®" is 
the marginal (i.e. projection, see [36], [37] for definition) of 
m®*®'(.) on the power-set 2©. This impossibility comes 
from the fact that there exist in general glexe'l_glel.gl@'l > 9 
elements of the power set gexe' (including some disjunctions 
of elements of © x ©’) whose mass of belief cannot be 
obtained from the masses of elements of 2° and of 20", and 
which contribute in the uncertainty measure of the joint BBA 
m®*®", Indeed, if |@| = N and |O’| = N’ the cartesian 
product space © x ©’ has N - N’ elements and its power 
set 2°*® has 2N-N” elements which is always bigger than 
the cartesian product space of power sets 2° x 2©" because 
QN .9N'(= QN+N") < 2N-N’ as soon as N > 2 and N’ > 2. 
It is worth mentioning also that most of elements of 2° x a2" 
do not have the same structure as the elements of the power 
set 2°*®". This means that we cannot recover the joint BBA 
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m®*® from the product, or combination, of its marginal m*® 
and m‘®" in general, but if the joint BBA is totally vacuous or 
if the joint BBA is Bayesian and if it is equal to the product 
of two so-called non-interacting (or independent) probability 
measures [8]. To be more clear, consider two FoDs © and ©’ 
with |O| = 2 and |0’| = 3. Hence the cartesian product space 
© x ©’ has 2-3 = 6 elements!, and its power set 2exe! 
has 2° = 64 elements (couples, and unions of couples). If 
we consider the vacuous BBA m°*® on 2°%® defined by 
m®*®'(® x @/) = 1, then its projection on @ is the vacuous 
BBA m9°(0) = 1 defined on the FoD O = {61,62} having 
only two elements, and its projection on ©’ is the vacuous 
BBA m® (@’) = 1 defined on the FoD 6’ = {6/, 05,04} 
having only three elements. Why the MoU of mexe (i.e. 
full ignorant source) related to 6 elements of O x ©’ should 
be less (or equal) to the sum of MoU of m® related to only 
the two elements of O and the MoU of me only related to 
the three elements of ©’? To amplify this point, if we consider 
|O| = 5 and |0’| = 8 then |O x 0’| = 40. Why the MoU of 
the vacuous BBA m®*®" related to 40 elements of © x ©! 
should be less (or equal) to the sum of MoU of vacuous BBA 
m® related to only 5 elements of © and the MoU of the 
vacuous BBA me only related to the 8 elements of 0’? We 
do not see any solid theoretical reason, nor intuitive reason, 
for justifying and requiring the subadditivity desideratum in 
the general framework of belief functions, and put it as a 
property to satisfy in general listed in [15]. Unlike Vejnarova 
and Klir opinions [38] (p.28) and many authors, we do not 
consider that the meaningful measure of uncertainty of basic 
belief assignment must satisfy the subadditivity property. The 
proposal of adding the desiderata of subadditivity, additivity, 
and monotonicity for a search of a MoU of belief functions had 
been explored and defended by Klir in [2] at the end of 1980s. 
It is however worth mentioning that if a MoU satisfies the 
desideratum D3 (when the BBA is Bayesian), its subadditivity 
property is always guaranteed because Shannon entropy is 
subadditive [8], [20]. 


Il]. EXISTING MEASURES OF UNCERTAINTY 


In this section we analyze most of existing measures of 
uncertainty available in the open literature related to belief 
functions. We verify if these measure pass, or not, the effec- 
tiveness test. We say that a MoU fails the effectiveness test if 
at least one of the desiderata D1, D2, D3 or D4 is not satisfied 
by the MoU under test. If necessary, we explain what is the 
problem with this MoU and when necessary we give a counter- 
example for it. 

The Tables I and I show the formulas of all the MoUs 
analyzed in this work. Some existing MoUs capture only 
some aspects of uncertainty? and have specific names given 
by their authors (e.g. conflict, dissonance, discord, strife, etc) 


'Bach element is a couple of the form (6, a), 4=1,2 and 7 = 1,2, 3. 
?referred to as entropy-like uncertainty, nonspecificity (or imprecision), and 
fuzziness which is uniquely connected with fuzzy sets [10]. 
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listed in the third column of these tables*. For convenience, 
the MoUs have been indexed and listed by the year of their 
publication in the tables I and II. We have also included in 
Tables I and II the name of authors of the MoUs, the names of 
the MoU when it exists (and eventually new names if needed 
for clarity), and the formulas of the MoUs. For convenience, 
we have used the natural log in the mathematical expressions 
of MoUs for the homogeneity of the presentation. Some 
authors prefer log, instead, but this preference does not really 
matter because the values of an expression will differ only 
from the constant multiplicative factor 1/log(2), and the 
unity will just change from nats to bits. 


The Table III indicates if each MoU satisfies, or not, the 
desiderata D1, D2, D3 and D4, and thus if it passes the 
effectiveness test, or not. Most of results listed in Table III 
are easy to verify directly from the mathematical definition 
of each MoU of Tables I and II, and are left as exercises 
for the reader. Some results however of Table II, specially 
those related to the failure of D4 desideratum, may appear 
less obvious to verify and that is why we give some nu- 
merical counter-examples for them in the Tables IV and V 
for convenience*. These counter-examples have been obtained 
from Monte-Carlo simulation of randomly generated BBAs 
for testing the desiderata. Of course, many more counter- 
examples can be found by Monte-Carlo simulation, but of 
course only one is sufficient to prove the failure of a MoU 
for a desideratum, specially for D4. Extra justifications about 
violation of desiderata by some MoUs are presented next. 


The MoUjoga(m) = — D3 xce m(X) log(m(X)) does not 
satisfy D2 desideratum because MoUj9g4(m®) = 0 whatever 
is the size of the FoD ©. Consequently, MoUjog4(m) > 
MoU jog4(™m,) if m ~ my, hence D4 desideratum is violated. 
That is why MoUjog4(m) cannot be recommended as an 
effective measure of uncertainty. 


The MoU jo995(7™) = T(m) does not satisfy D4 desideratum 
because we can have m 4 m, such that T(m) = T(m,) as 
shown in the counter-example given in [52] (p.165). See also 
our simpler counter-example given in Table IV. 


The MoUjo992(m) = S(m) (ie. the strife) does not satisfy 
D2 desideratum because one can easily verify that one has 
always’ S(m°) = S(m®') = 0 when || 4 |©’|. The strife 
does not satisfy D4 either because if m is the uniform Bayesian 
BBA on (non-empty) FoD 0, one has S'(m) = log(|O|) which 
is greater than zero, proving that S'(m) violates D4. 


The MoUj992»(™m) = NS(m) does not satisfy D4 desider- 
atum because we can have m # m, but such that NS(m) = 
NS(m,), as shown in the counter-example of Table IV, 


3The names and notations are not always homogeneous from one author to 
another, for instance U-uncertainty is also called nonspecificty and denoted 
by N(m) in [39]-[41]. 

4The numerical values have been truncated to their 3rd digit. 

5It is worth noting that Klir’s statement, at the bottom of page 86 of [8], 
saying (using our notation) S(m,) = log(|O]) is clearly wrong. 
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Tabl 


el 


LIST OF EXISTING MOUS FOR THE PERIOD 1980-2000. 


Measures of Uncertainty | Author(s) & Ref. 


[46]-[48 


Lamata et al. [37] | lower entropy 


m 
MoU j990(™m 
MoU j990b 
Mo m 
m 


Lamata et al. [37] | upper entropy 


Klir et al. [52]— | discord 
ee ee 
Klir et al. [52], | total uncertainty 
a ee 
Klir et al. [38], | strife 
7 ae 
SS 

Pal et al. [6] average total uncertainty 


Maeda et al. [55] | Maeda extended entropy 


Harmanec et al. i i 


U 
U 
MoU j994(™ amount of uncertainty 

[39], [56] 
[45] 
joa Neal 
oU m 
U m 


) 
m 
m) 


Klir [58] Shannon-like measure 
sce il al 


where® U(m) = log(2) and S(m) = log(3) — log(2), so 

that NS(m) = U(m) + S(m) = log(3), and we have 

U(my) = log(3) and S(m,) = 0 yielding N'S(m,) = log(3), 
( 


and hence proving NS(m) = NS(m,). 

The MoUjo94(m) = AU(m) proposed by Harmanec and 
Klir [39], [40] is nothing but the maximal Shannon entropy 
value obtained by analyzing all the pmfs P(-) compatible with 


Bel(-) and Pl(-) functions of the BBA m(-) such that for all 
X CO, Bel(X) < V9,cx P(hi) < Pl(X). More precisely, 


PY) =arg max, — S~ P(6;) log(P(6:)) 
pmf P(-) 0,60 


This max-entropy pmf P*(-) is obtained by solving a non 
linear optimization problem, see [86]-[88]. It is clear that this 


©The easy verification from U(m) and S(m) formulas is left to the reader. 
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Mathematical expression 


C(m) = — m(X) log( Bel(X )) 
xXCO 


E(m = m(X ) log( PIX )) 
XcO 
N(m) =1—- m(X )/| Xx 
xXxcO 
U(m) = m(X ) log(|X. 
xXcO 


= m(X ) log(m(X ) 


— m(X ) log( PIX )/|X 
xXxCO 


m(X) sup{log(PU(8;))|8: € X} 


+ log( S> m(X)|X/) 


xco 


ToL 
ly 


D(m) =— > m(X)log( > m(Y) 


XCO YCO 
T(m) = U(m) + Dim 


) 


S(m)=— Q7 m(X)log( 7 m(Y) 
xXCO YCO 
NS(m) = U(m) + S(m 
ATU(m) = — m(X ) log(m(X)) 4 
xco 
M(m) = AU(m) + U(m) 
AU(m) = — P*(0;) log(P* (0; 
a,€9 
TO(m) = YE m(X)( Ye m(Y)[1 


xco YCco 


—TXAY 
[XUY| 


) 


Has(m) = — PUX) log(Bel(X 
XCO|m(X)>0 
Bel(6; ) log(Bel(@; ))+ P16; ) log( PIG; 
6;€O 
m(X) 
|x| 


m(X) 
Jlog 1X] 


XCOlO;,EX 


0;€9 XCOlO;EX 


MoU, as well as all other Shannon-alike entropy measures 
based on different probabilistic approximations techniques’ (as 
BetP-entropy, PlPr-entropy, or DSmP-entropy, etc) of (non- 
bayesian) BBA m to a bayesian BBA fail to satisfy D4 
desideratum. Indeed, the vacuous BBA m, will always be 
approximated by the uniform pmf P"""(.) defined on the FoD 
©, and there will be no difference between the Shannon-alike 
entropy value for m, (for the total ignorant source of evidence) 
and the Shannon-alike entropy value of the Bayesian uniform 
BBA. This explains why AU(m) and all other Shannon-alike 
entropies violate the D4 desideratum. 


The MoUjo6(m) =TC(m) violates D2 because 
TC(m,) =0 whatever is the dimension of the (non- 


7BetP,,, DSmP,,, and PIPr,,, are different probabilistic transformations of a 
non-Bayesian BBA into a Bayesian one. They have been proposed by different 
authors, see in [89]-[91] for details. 
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Table II 
LIST OF EXISTING MOUS FOR THE PERIOD 2001-2021. 


Measures of Uncertainty | Author(s) & Ref. Mathematical expression 


MoU20993(™m Dezert et al. [61], | Pignistic entropy or BetP- AM(m) = — BetPm (0;) log( Bet Pm (9:)) 
Jousselme et al. | entropy or Ambiguity 
[11] measure 


eaeae 
MoU 91 65(™m Yang et al. [63] total uncertainty, d/(- ([Bel(@;), P1(9;)}, [0, 1]) 
is Wasserstein a ; 
MoU 2917(m Deng et al. [64] improved T'U*(m) with i = Bel(@;), Pl(0;)], [0, 1])] 
Euclidean distance df i 
MoU29175 (7) faa et al. [65]— | improved Deng entropy m(X) ToT 


MoU2017c(™m) ~ et al. [68] Tang weighted belief en- _ 
tropy xCcoe 
MoU 913(m Extended PIPr-entropy = i) log(PlPrm(@i)) +U(m 
[69] i 
MoUz018»(™) q-entropy )= > *Tq(X) log(q(X)) 
[70], mm <e 


MoU 291 8c(™) Mambé et al. [72] | Mambé entropy Eym(m)=— > m(X) log [ate aie 
= 
xXCO 
MoU 9jga(7™) Pan et al. [73] Pan Ist entropy Dy: Bel{ X)+PUX) log Se 
~ SU 


Wang et al. [74] Wang entropy 
ll 
MoU 2919(m Li et al. [75] Li entropy 


MoU2919p (72) Cui et al. [76] Cui entropy 
MoU 291 9¢(™m Pan et al. [77] Pan 2nd entropy 


xXxCO 
MoU2019a(™m Chen et al. [78] Chen entropy . — YS m(X) log 
XO 


Bel(9;)+P1(9;) Hinton) ) 

el(O;)+ A el(0;)+ i) ,—(Pl(0;)—Bel(0; 7 
MoU a2 19e (7) Zhao et al. [79] Zhao entropy = 2 Beli F PUG) Jog (Bear Pues) (PU(6;) ( ») 

» m(X) log (tle (PUX)— nee) 
XCO,|X|>1 
<a 
2 L“yce “Ter 
ba 


MoU2920(m) Li et al. [80] Li improved entropy 


_ — 


Chen et al. [82] Chen [eee | entropy Cc = eo! Tel otay=t [ecnct i ae 


0U2920e(™) Yan et al. [84] Yan entropy m(X) log a Le fern) 


= 
) 
e 


Rta 


MoU 2921 (™m This paper Extended BetP-entropy a BaP 0; EEoae,, 6;)) +U(m 
0,€0 


This paper Extended DSmP-entropy ARS =— DSmPm (0;) log(DSmPm (8:)) + U(m 
6,€0 
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Table II 
DESIDERATA VERIFICATION, AND EFFECTIVENESS TEST RESULTS. 


Measures of Uncertainty D2 D3 Effectiveness test 
m®) < U(m° U(m) is Shannon entropy } result 
for m(6;) = if |O| < |e’ | for Bayesian BBA if 


1981 = failed 
1983(™m ‘ ‘ failed 
1983b(m) = ‘ failed 
1983c(m) = U ‘ ‘ failed 
1984(™m) = Em(m) failed 
1987(m) = O’(m) failed 
1987b(™) = d(m) $ failed 
198s (m) = I(m) failed 
19ssb(™m) = Lent(m) ‘ failed 
1988¢(™m) = Uent(m) ‘ ‘ ‘ failed 
1990(m) = D(m) failed 
1990b(m) = T(m) failed 
1992(m) ‘ ‘ failed 
1992p(m) = NS(m) failed 
1993(m) = ATU(m) failed 
1993p (™) = M(m) s $ : okay 
1994(m) = AU(m) ‘ ‘ ‘ failed 
1996(m) = TC(m) failed 
1997(m) = oe ‘ ‘ failed 
1999(m) = failed 
2000(m) = failed 
( yes s s failed 
2016(m) = yes failed 
2016b(™m) = yes failed 
yes failed 

20176 (m) yes failed 
2017¢(m) = yes ‘ failed 
2018(m) = yes okay 
m) = no failed 

‘= yes ‘ ‘ failed 
2018d(m) = no ‘ failed 
2018e (mM) = yes okay 
2019(m) = Qin). yes failed 
2019b(™m) = yes s ‘ failed 
2019¢(m) = (n ) yes failed 
20194(m) = yes failed 
= yes ‘ ‘ failed 

no failed 

ane ) yes failed 
2020c(m) = no (NaN) ( failed 
20204(m) = yes s ‘ failed 
2020e(m) = Hn(m) yes failed 
2020f (m) = yes ‘ failed 
2021(m) = yes s ‘ okay 
2021b(m) = ARS ) yes ‘ ‘ okay 


Table IV 
COUNTER-EXAMPLES FOR SOME MOUS ON © = {64, 02, 63}. 


a A 
0 0 
0 0 
0 0 
1/3 1/3 


0 


1/3 


1/3 
6, U 82 U 43 5 0 5 
ATO ey = LOE 
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Table V 
COUNTER-EXAMPLES FOR SOME MOUS ON 9 = {61, 62, 03}. 


empty) FoD 0. It violates D3, because for Bayesian BBA 
one gets TC(m) = S>;_, P(0;)(1 — P(0;)) as reported in 
[45]. It also violates D4 in general because for Bayesian 
BBA one has TC(m) > 0, except in the particular Bayesian 
case where the BBA is entirely focused on a singleton 
0; (i.e. m(0;) = 1). In this particular case we obtain 
TC(m) = TC(m,) = 0. So for all Bayesian BBAs m we 
will always have TC(m) > TC(m,), which clearly violates 
D4 desideratum. 


The original formula of MoUj997(m) = Has(m)_ pro- 
posed by Maluf in [57] was actually Hg,(m) = 
— S° PI(X)log(Bel(X)) which is obviously ill-defined 

xCO 
when PI(X) > 0 and Bel(X) = 0 because log(0) = —oo. 


That is why we did consider only focal elements of the 
BBAs m in the modified formula Hy,(m) given in Table 
I. For any cardinality of non-empty FoD © we have always 
Has(my) = 0 because for the vacuous BBA my,, the only 
focal element is © for which Bel(O) = PI(Q) = 1 so 
that Has(my) = —PI(Q) log(Bel(@)) = —1log(1) = 0. So, 
Has(m) violates D2. This MoU violates also D4 because for 
Bayesian BBA H,;(m) is the same as Shannon entropy, and 
Shannon entropy is greater than zero in general. 


The MoUnroo0(m) = Hs(m) (Shapley entropy) coincides 
with Shannon entropy for Bayesian BBAs, and one can easily 
verify that Hg(m,) = log(|O|) which is also the same 
maximum value of Shannon entropy for the uniform Bayesian 
BBA. Hence Hg(m,) is not the unique maximum measure 
of uncertainty value when we use Shapley entropy. Also it 
can be verified that this maximum value can be also obtained 
by non-Bayesian BBA. For instance, if 0 = {61, 02,03} 
and m6, U 02) = m(O, U 03) = m/(O2 U 63) = 1/3, 
then Hs(m) = log(3), which is also the same value as 
for Hs(m®). Because Shapley entropy proposed by Yager 
violates D4 desideratum, we cannot recommend it as an 
effective MoU. 


The MoU 2o16(m) = Ea(m) (Deng entropy) has recently 
aroused the interest and enthusiasm of some researchers be- 
cause it was highly publicized by Deng during the last five 
years [14]. We really wonder about such strong interest of 
this MoU because Deng entropy is obviously not effective, 


as proved by our simple counter-example given in Table V. 
Abellan has already pointed out the problem of Deng entropy 
in [92]. Nevertheless, some researchers try to use it, publicize 
it or improve it unsuccessfully as shown in our analysis 
summarized in Table II. So, it is clear that Deng Entropy is not 
recommended for applications, as well as other generalizations 
(modifications or extensions) of it, as those recently proposed 
by the same author (Rényi-Deng (R-D) entropy, Tsallis-Deng 
(T-D) entropy, Rényi-Tsallis-Deng (R-T-D) entropy, Interval- 
valued Deng entropy, Fractal-based belief Deng entropy, Deng 
entropy for orderable set, etc), see for instance [93], [94] 
because they do not have interest since they are non-effective. 
We emphasize that even if a MoU collapses with Shannon 
entropy (as Deng entropy does) when a BBA is Bayesian, it 
can be non-effective and useless if it violates D4 desideratum. 
That is why Deng entropy (and all its recent variants based on 
it) is non-effective as most of other MoUs actually reported 
in Table III. 


The MoUzo1g,(m) = Hq(m) (q-entropy alike) violates D1 
because H,(m) can be negative so its minimum value is not 
zero. For instance if O = {41, 02, 63} and m(0,U02) = m(A,U 
63) = m(@2 U 63) = 1/3, then Hy(m) ~% —0.2877. This 
MoU also violates D2 because H,(m,,) = 0 whatever is the 
dimension of the (non-empty) FoD ©. This MoU collapses 
with Shannon entropy because if m is a Bayesian BBA one 
has q(X) = m(X) for all X C ©, and the focal elements of m 
are necessarily singletons X C O for which |X| = 1, so that 
(—1)!X! = —1, and consequently the mathematical definition 
of H,(m) given in Table I is same as Shannon entropy. This 
MoU violates D4 because for Bayesian BBA H,(m) is the 
same as Shannon entropy, and Shannon entropy is greater than 
zero in general®. For instance if © = {61, 2,03} and m(61) = 
m(@2) = m(63) = 1/3, then H,(m) = log(|O]) = log(3) > 
0. Hence H,(m) > Hy(my). 


The MoUnzoiga(m) = Hyei(m) (Pan Ist entropy) violates 
D1 because if we consider the simplest case of FoD with 
O = {61,02}, and the specific BBA m(@;) = 1, we 
have [Bel(@), Pl(A,)] = [1,1], [Bel(@2), Pl(@2)]| = [0,0] 
and [Bel(0; U 02), PI(@, U 62)] = [1,1], so we have 


Sexcept in the case where m(0;) = 1 for some 6; € O. 
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(Bel(61) + Pl(61))/2 = 1, (Bel(@2) + Pl(62))/2 = 0 and 
(Bel(61 U 02) + PU(A, U O2))/2 = 1. Hence Hyei(m) = 
—1log(1/(2' — 1)) — Olog(1/(2' — 1)) — 1log(1/(2? — 
1)) = log(3) > 0. Pan Ist entropy violates D3 (Shannon 
entropy consistency) too because if m is the uniform Bayesian 
BBA given by m(@1) = m(02) = 0.5, then Hyei(m) = 
—0.5 log(0.5) — 0.5 log(0.5) — 1log(1/3) = log(2) + log(3) 
which is greater than Shannon entropy which is equal to 
—0.5 log(0.5)—0.5 log(0.5) = log(2). Pan Ist entropy violates 
D4 also because for the vacuous BBA m,(61 U 62) = 1, 
one has [Bel(0,), Pl(01)] = [0,1], [Bel(@2), Pl(@2)| = [0, 1] 
and [Bel(0, U 02), Pl(@, U 62)| = [1,1], and (Bel(@1) + 
P1(0,))/2 = 0.5, (Bel(02) + Pl(02))/2 = 0.5 and (Bel(0, U 
62) + PLO, U@2))/2 = 1, so that Apei(my) = —0.5 log(0.5) — 
0.5 log(0.5) — Llog(1/3) = log(2) + log(3) which is the 
same value as for uniform Bayesian BBA, so Hoe; (my) is 
not strictly greater than other Pan Ist entropy values. 


The formula of MoU2oige(™m) = SU(m) (Wang entropy) 
has been kept with its orignal formulation (with log,(-) 
function) in Table I, so it is expressed in bits. If one wants 
to express SU(m) in nats we must replace log,(-) function 
by the natural logarithm function log(-) and the second terms 
(P1(0;) — Bel(0;))/2 must be multiplied by log(2) in the 
mathematical definition of SU(m). 


For the MoU2o19n(™m) = Ecui(m) (Cui entropy) proposed in 
[76], it is clear that the original mathematical definition of this 
entropy does not fit with the derivations of what the authors 
have in mind when making their numerical examples in their 
paper because of a mistake in their exponential term. That 
is why we have to correct this term by replacing )*yce by 

Y2X 


YyCo 
YAX&m(Y)>0 
D4 desideratum as shown in the example of Table V. 


in the original formula. Cui entropy violates 


The MoU2o19.(m) = Hpg(m) (Pan 2nd entropy) is not 
effective because Hpg(m,) coincides with Hpg(m) when m 
is the uniform Bayesian BBA, so it violates D4 desideratum. 


The MoUaojoa(m) = E;(m) (Chen entropy) is not effec- 
tive because one can have £;(m) > E;(m,). For instance, 
consider the vacuous BBA m, on FoD 0 = {6;, 62,63}, 
then E;(m,) = log(2'®! — 1) = log(7) = 1.9459, and if 
one considers the uniform Bayesian BBA for which m(@,) = 
m(62) = m(43) = 1/3 one gets E;(m) = —log($- 4) = 
2log(3) = 2.1972 > E;(m,). So, Chen entropy violates D4 
desideratum. 


The MoUn2oi9e(m) = Hinter(m) (Zhao entropy) is not 
effective because it violates D4 desideratum. As simple 
counter-example, consider 0 = {61,092,063} with the BBA 
m(64 U 02) = m(O4 U 03) = m(O2 U 03) = 1/3, then 
Hinter(m) = 4.6291 nats, where as for vacuous BBA 
My(O) = 1 we get Hinter(My) = 4.4856 nats. Clearly 
Hinter(m) > Hinter(my), which does not make sense be- 
cause the vacuous BBA m, characterizes the most ignorant 
source of evidence. 


It is worth mentioning that the numerical examples given 
by Li and Cui in their paper are incorrect because they are 
inconsistent with their original new entropy formula (12) for 
IQmi, see [80]. If we admit that the original Li’s definition 
of entropy is correct then we get the effectiveness test re- 
sults listed for this entropy in Table III, and we conclude 
that the MoU2920(m) = IQzi(m) (Li improved entropy) is 
not effective. If we consider that numerical examples by 
Li and Cui are correct, then we need to modify the expo- 
nent term in the original Li’s definition (12) of [Q mj; as 


yCo |X 1 Y|/|O]. In this case the effectiveness 


YAX&m(Y)>0 ; ; . 
test result is worse because this modified Li improved entropy 


will fail to pass the four desiderata, and it is still non-effective. 


The MoU220n(m) = Usexp(m) (Wen entropy) violates 
clearly Shannon entropy compatibility desideratum D2, and for 
the vacuous BBA m, one has always Uexp(™m,) = 1 whatever 
is the dimension of the FoD ©. Therefore Wen entropy does 
not verify desideratum D2. It is not certain that Uexp(m) 
satisfies, or not, D4 desideratum, but we did thousands of 
Monte Carlo tests with random BBAs for different size of 
FoD ©, and Uexp(m) did always pass successfully the D4 
test, so we conjecture that Wen entropy satisfies D4. Even if 
our conjecture about satisfaction of D4 for Uexp(m) is wrong, 
it does not change our conclusion that Wen entropy is not 
effective because it fails to verify D2 and D3. 


MoU2920-(m) = EG,,(m) (Chen improved entropy) is not 
mathematically well-defined because when the BBA m has 
only one focal element (i.e. |Fo(m)| = 1), then one has a 
division by |Fe(m)| — 1 = 0 which yields a NaN (Not a 
Number) indeterminate value in Table III. Even if |Fe(m)| > 
1 this entropy is not compatible with Shannon entropy for 
Bayesian BBAs. So, Chen improved entropy is not effective. 


MoU2020a(m) = Q(m) (Qin entropy) violates D4 desider- 
atum because Qin entropy takes same value log(|©]|) for the 
vacuous BBA and for the uniform Bayesian BBA. 


MoU2o20e(m) = H,,(m) (Yan entropy) is non-effective. 
A counter-example for D4 desideratum is given in Table V 
expressed in nats. To express them in bits we have of course to 
divide our results by log(2). It is worth noting that in Section 
IIL.B of [84], the numerical results given by Yan and Deng for 
H,,(m3) and H,,(m4) for their example 5 are wrong. 


MoU2o20¢(m) = Her(m) (Li-Pan entropy) is also non- 
effective. A counter-example for D4 desideratum is given in 
Table IV. 


IV. DISCUSSION 


Our analysis of forty-five measures of uncertainty listed 
in Tables I and II covering 40 years of research in this 
field reveals that almost 89 % of them are non-effective 
because they violate at least one of the four very essential 
desiderata D1, D2, D3 or D4. In our analysis only five 
MoUs (M(m) 1993, Hpis.(m) 2018, SU(m) 2018, Hetp(m) 
2021, Abs np(m) 2021) pass successfully the effectiveness 
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test as we can observe in Table III. We see that all these 
effective MoUs share two basic principles: 1) approximate the 
BBA m by a probability measure (i.e. a Bayesian BBA) P,, 
based on some method and evaluate its Shannon entropy to 
estimate the randomness (or conflict) inherent to the BBA, 
and 2) add a term to Shannon entropy value that estimates 
the level of ambiguity (or non specificity) inherent of the 
BBA (usually thanks to Dubois & Prade U-uncertainty). This 
general principle is simple and quite intuitive but it lacks 
seriously of theoretical justification. We consider such type 
of effective MoU construction is unfortunately conceptually 
flawed and not very satisfactory for the two following reasons. 


e Ist reason: These effective MoUs highly depend on 
the choice of the method of approximation. This mech- 
anism appears quite arbitrary, and we do not see 
any strong justification for preferring one of them, ei- 
ther P* in M(m) MoU, BetP in Hgip(m) MoU, 
DSmP in Hysnp(m) MoU, mid belief-interval value 
(Bel(0;) + PI(0;))/2 in SU(m) MoU, etc. Worse, a 
method of approximation can be totally misleading as 
for instance Cobb-Shenoy P/Pr,, transformation [90] 
because the evaluation of probabilities can be incon- 
sistent with belief interval values. More precisely, one 
can have PIPr,,(0;) ¢ [Bel(0;), Pl(0;)] with Cobb- 
Shenoy method, which is obviously not reasonable, nor 
acceptable at all. As a simple counter-example of Cobb- 
Shenoy transformation, just consider 0 = {6),02,03} 
with m(0,) = 0.2 and m(@2 U 63) = 0.8. Then, 
[Bel(@,), Pl(@,)| = [0.2,0.2], [Bel(@2), Pl(@2)|} = 
[0,0.8] and [Bel(@3), Pl(@3)]| = [0,0.8]. Applying 
PlPr,,, transformation, we get PIPr,,(01) = 0.2/(0.2+ 
0.8 + 0.8) ~ 0.112. Therefore PIPr,,(0,) < Bel(0,) 
which shows that PlPr,,(0:) ¢ [Bel(01), P1(01)]. We 
emphasize the fact that if a method of approximation 
of a BBA m by a probability measure P,, is chosen, 
it must be at least consistent with belief interval values 
generated by the BBA m under concern. Clearly, we 
cannot recommend Cobb-Shenoy P/Pr,, transformation 
for building an effective MoU based on aforementioned 
principles 1) and 2) as His,.(m) MoU proposed recently 
by JirouSek and Shenoy based on questionable Shafer 
semantics and fallacious Dempster’s rule arguments. 

e 2nd reason: More fundamentally, we do not see any 
serious reason which necessitates the arbitrary use of an 
approximation of any (non-Bayesian) BBA by a Bayesian 
BBA at first for using Shannon entropy measure as Ist 
valid principle. Also why do we need, or request, to 
make the distinction of the two aspects of uncertainty 
(conflict and non-specificity) in additive manner? This is 
conceptually very disputable because the randomness (or 
conflict) and ambiguity (or nonspecificity) are actually 
interwoven in a subtle way that needs to be explored 
in deep for a better understanding of the mechanism 
governing the uncertainty with a better description of the 
(probably non-additive) link between them. 


Very recently however Zhang et al. in [104] did propose 
three new innovant effective MoUs not based on arbitrary 
approximation of the BBA by a probability as in the afore- 
mentioned effective MoUs. These measures are denoted by 
H*(m), H?(m) and H3(m) and respectively defined by? 


H'(m) =— S~ m(X)log,(PUX)) 
XCO 
+ $5 m(X)2 log, (|X) (5) 
XCO 
H?(m) =— S_ m(X)log,(PI1(X)) 
XCO 
+ S~ m(X) log, (2'*! — 1) (6) 
XCO 
H3(m) = — S~ m(X) log, (PUX)) 
XCO 
+ $5 m(X)|X| (7) 
XCO 
|X]>1 


These new effective MoUs differ conceptually from the 
previous effective MoUs M(m), Hps.(m), SU(m), HRtp(m) 
and HBs.,p(m) but the authors fail to capture well the inter- 
woven link between conflict and non-specificity (or impreci- 
sion). Actually the authors set arbitrarily the range of their 
MoU as a simple parameter, either taken as [0,2 log,(|O])], 
[0, logy (2!©! — 1)] or [0, |O|], to define their H!(m), H?(m) 
and H°(m) measures of uncertainty. This approach is rather 
ad-hoc and very questionable, and possibly other ranges 
could have been chosen instead. The authors do not iden- 
tify (or propose) the best MoU to select between H1(m), 
H?(m) and H?(m) which is a serious problem for using 
them in applications. Which one to choose? The other se- 
rious problem in this approach is the lack of solid justifi- 
cation for using the plausibility function in the summation 
—Vxco m(X) log,(PI(X)). Although effective, these three 
new MoUs are actually ill-justified and heuristically defined, 
and somehow they can be considered as conceptually flawed. 


V. CONCLUSION 


In this paper we have clearly proved that most of existing 
measures of uncertainty proposed during the last forty years 
are actually non-effective, and we consider that the effective 
ones are conceptually defective. We emphasize the fact that 
in this jungle of non-effective measures, many of them have 
bloomed like mushrooms since 2016 with the publication of 
Deng’s paper because of its high publicity. Most of papers 
since 2016 do not pay attention to the four essential properties 
that an effective MoU must satisfy, which is a serious problem. 
We regret this matter of fact, and we hope that this paper have 
pointed out clearly this concern, and also that it will help 
to reduce the proliferation of useless publications about non- 
effective MoUs. We encourage the future authors working on 


°We correct here the definition of H?(m) which is mathematically badly 
formulated in [104]. 
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new MoUs to verify the effectiveness of their MoU as done re- 
cently by Zhang et al. in [104]. We agree with Abellan, Mantas 
and E. Bossé vision that an (effective) MoU should not be too 
complicate to calculate (with direct simple explicit mathemat- 
ical formula), must obviously incorporate the two aspects of 
uncertainty (in a subtle and efficient interwoven manner), and 
must be sensitive to changes of evidence. Recently, we have 
developed in [105] a better conceptual effective measure of 
uncertainty for the basic belief assignments not based on the 
additive decomposition of conflict and non-specificity which, 
we hope, will attract the attention of all readers interested by 
this topic for their own applications. 


APPENDIX 
Shannon entropy 


Consider a random variable represented by a probability 
mass function (pmf) Py = (pi,po,--.,pn), Where pj = 
P(@;) is the probability of the 7-th state 0; (i.e. outcome) of 
© = {61,62,...,@}. Shannon was interested in communi- 
cation systems where the various events were the carriers of 
coded messages, and he did propose (and justify) his entropy 
measure as appropriate measure of average uncertainty (or 
measure of randomness) of a random variable [17], [18], [21], 
[22]. In the classical information theory, the entropy of a 
random variable is the average level of surprisal, or uncertainty 
inherent in the variable’s possible outcomes [95]. It is worth 
noting that Shannon theory does not concern the semantic 
aspects of the content of a message [46], [96], [97], but only 


its transmission through communication systems. Shannon 
entropy formula is defined by!® 
|9| 
H(Py) = - », P(e) )log(P(6:)) (8) 


By convention, we take P(6;)log(P(6;)) = 0 if P(0;) = 0 
which is easily justified by continuity since x log(x) — 0 as 
x — 0. Adding terms of zero probability does not change 
the entropy. In (8) we use the natural logarithm (i.e. base 
e logarithm) and in this case the Shannon entropy value is 
expressed in nats unity. We can also use the base 2 logarithm 
(logy) function instead of the natural logarithm, and if so 
the Shannon entropy value will be expressed in bits. In this 
case, the entropy is the number of bits on average required 
to describe the random variable, or equivalently the minimum 
expected number of binary questions required to determine the 
value of the random variable. 

Shannon entropy can be interpreted as a generalization 
of Hartley entropy (1928) [98], [99] when presuming the 
pmf of equally probable states (i.e. uniform pmf PY"! for 
which P(@;) = 1/N for i = 1,2,...,N), hence getting 
H(Pxt) = log(|O|) = log(N). Note that if we have a 
uniform pmf P%"f defined on © with |O| = N and another 
uniform pmf Pf defined on 0’ with |O’| = N’, and if 
|O| < |©’| then H( Putt) < H(Pxt) because log(|O|) < 


'OThe symbol £ means equal by definition. 


log(|O’|) since log(a) is an increasing function. The minimum 
value of Shannon entropy is zero, which characterizes a non- 
random (or sure) event 6; for which P(6;) = 1, because 

— DIP! P(6;) log(P(:)) = —P(G;) log(P(8;)) = 0. 

In fact, Shannon rarely used the term information (nor infor- 
mation content) in his works, and he preferred the term entropy 
to describe the scattering of symbols in the communication 
system. As reported in [100], in 1961 Shannon explained to 
Tribus his choice for naming the measure of uncertainty as en- 
tropy, instead of information as follows: “My greatest concern 
was what to call it. I thought of calling it ’information,’ but 
the word was overly used, so I decided to call it ’uncertainty’. 
When I discussed it with John von Neumann, he had a better 
idea. Von Neumann told me, ’You should call it entropy, 
for two reasons. In the first place your uncertainty function 
has been used in statistical mechanics under that name, so it 
already has a name. In the second place, and more important, 
no one really knows what entropy really is, so in a debate you 
will always have the advantage.” Shannon did not prove that 
his entropy formula is the best measure of uncertainty, and 
even if it is a measure for information. He only stated a set 
of reasonable criteria [101] to describe a measure that would 
serve the requirements of his signal transmission theory, and he 
found that the entropy formula meets those criteria. We prefer 
to interpret Shannon entropy as a measure of uncertainty (or 
randomness) of a pmf, rather than a measure of information 
content [101], because of multiple possible interpretations and 
definitions of information. 

The main algebraic properties of the Shannon entropy are, 
see [20] p. 30 for details: the symmetry, the normality", 
expansibility, decisivity, additivity and recursivity. We recall 
that Shannon entropy value H(Py) is always smaller than 
H(Pxt) if Py A Pini, expressing the fact that the uniform 
pmf is the only pmf giving the maximal Shannon entropy 
value, and characterizing the maximum of uncertainty (or 
randomness), which is called the maximality property. Another 
important property of Shannon entropy is its subadditivity 
property when considering two (not necessarily independent) 
events, see [20] p. 36, which can be formulated by the 
following inequality 


H(Py.n') < H(Py) + H(Pn-) (9) 


where Py.y is the joint pmf defined on cartesian product 
space © x O! = {(9;,0;),¢ = 1,2,...N,j = 1,2,...,N’}. 
Py and Py: are marginal pmfs (i.e. the projections) of the 
joint pmf Py. on spaces (i.e. frames of discernements) O 
and ©’ respectively. 


Belief functions 


The belief functions (BF) have been introduced by Shafer 
[23] to model epistemic uncertainty to reason about uncer- 
tainty. We assume that the answer of the problem under 
concern belongs to a known finite discrete frame of dis- 
cernement (FoD) O = {61,62,...,An}, with n > 1, and 


‘This stipulates that H(P8"!) = 1 using base 2 logarithm function in (8). 
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where all elements of O are exhaustive and exclusive. The 
set of all subsets of © (including empty set 9, and OQ) is 
the power-set of © denoted by 2°. The number of elements 
(i.e. the cardinality) of 2° is 2!°!. A (normal) basic belief 
assignment (BBA) associated with a given source of evidence 
is a mapping m/(-) : 2° — [0,1] satisfying m(0) = 0 and 
do acge M(A) = 1. The number m(A) is called the mass of 
A committed by the source of evidence. The subset A € 2° 
is called a focal element (FE) of the BBA m(-) if and only if 
m(A) > 0. The set of all the focal elements of the BBA m(-) 
is noted by Fe(m) = {X € 2°|m(X) > O}, or just F for 
shortand notation when there is no ambiguity on the FoD O 
and the BBA m we are using. The core C(m) of a BBA m is 
the union of all its focal elements, i.e. C(m) = 
X€Fe(m) 

The belief of A denoted Bel(A) and the plausibility of A 
denoted Pl(A) are usually interpreted respectively as lower 
and upper bounds of an unknown (subjective) probability 
measure P(A). They are respectively defined for any A € 2° 
from the BBA m/(-) by 


Bel(A)=  S> m(X) (10) 
XE2°|XCA 
and 
PU(A)= = S> — m(X)=1-Bel(A). (11) 


X€2°|ANXZ0 


where A represents the complement of A in 0, that is A 
0 \ {A} = {X|X € O and X ¢ A}. The symbol \ denotes 
the set difference operator. Also, the commonality function 
q(-) defined for all A C © by q(A) = di yceyacx MX) is 
involved in the some derivations, for instance in the definition 
of MoUj987(m) (cf Table I). The vacuous BBA (VBBA for 
short) representing a totally ignorant source is defined by 
m,(©) = 1. In this short presentation, we implicitly work 
on the FoD © and so we did omit to refer to it in our 
previous notations. If we have to work with BBAs defined on 
different FoDs, say © and 0’, then we will explicitly indicate 
these FoDs in the BBA notations as m®(.) and m®(.). In 
the classical theory of belief functions the combination of 
several distinct sources of evidence characterized by their 
BBAs defined on the same FoD is done with Dempster’s 
rule of combination, see [23]. To circumvent the problems 
of Dempster’s rule (e.g. its dictatorial behavior, its possible 
insensitivity to conflict level, its counter-intuitive results in 
high and low conflicting situations, etc), other rules have been 
developed in particular those based on proportional conflict 
redistribution (PCR) principles, see [102], [103]. 
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Abstract—This paper presents a new effective measure of un- 
certainty (MoU) of basic belief assignments. This new continuous 
measure is effective in the sense that it satisfies a small number 
of very natural and essential desiderata. Our new simple math- 
ematical definition of MoU captures well the interwoven link of 
randomness and imprecision inherent to basic belief assignments. 
Its numerical value is easy to calculate. This new effective MoU 
characterizes efficiently any source of evidence used in the belief 
functions framework. Because this MoU coincides with Shannon 
entropy for any Bayesian basic belief assignment, it can be also 
interpreted as an effective generalization of Shannon entropy. We 
also provide several examples to show how this new MoU works. 


Keywords: Measure of Uncertainty, MoU, belief functions, 
Shannon entropy. 


I. INTRODUCTION 


In the classical probabilistic framework of the theory of 
communication developed by Shannon in 1948 [1], [2], the 
measure of uncertainty (MoU), also called entropy, for char- 
acterizing a source of information (from signal transmission 
standpoint) is represented by Shannon entropy. This entropy 
measures the randomness of a probability mass function. 
Shannon entropy has played a very important role in the 
development of modern communication systems during the 
second half of the 20th century, and in signal and image 
coding, data compression, and cryptography [3] until today. 
Shannon theory does not concern the semantic aspects of the 
content of a message but only its transmission. 

From 1980s and until now, many research works have been 
proposed to try to extend Shannon measure of uncertainty 
(i.e. entropy) in the belief functions framework since their 
introduction by Shafer in the mid of 1970s [5]. In parallel, 
other research works have been done on the characterization 
of particular aspects of the uncertainty which are related 
to the set consistency (or non-specificity) of basic belief 
assignments (BBAs). Recently Jousselme et al. [6] proposed 
an interesting attempt of mathematical unification of existing 
MoU formulations. In our recent survey paper [7], we did 
analyze in details 40 years of research works on MoUs. Our 
deep analysis of forty-eight MoUs reveals that only very few of 
them can be considered as effective in the mathematical sense 
defined in Section IH. Unfortunately, these existing effective 
MoUs are conceptually flawed. The main contribution of this 
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paper is to provide a clear positive answer with a new well- 
justified mathematical solution to the fundamental challenging 
question stated in the conclusion of [7]: 

Is there a better conceptual effective measure of uncertainty 
for the basic belief assignments? 

This paper is organized as follows. Section II presents the 
basics of belief functions. Section III presents and justifies the 
four essential desiderata that a MoU must satisfy in order to be 
effective. In the section IV we list the existing effective MoUs 
and we explain their conceptual flaws. Section V presents 
the new effective MoU for BBA (i.e. generalized Shannon 
entropy) with some examples in the section VI. Concluding 
remarks and perspectives are given in the section VII. 


II. BELIEF FUNCTIONS 


The belief functions (BF) were introduced by Shafer [5] for 
modeling epistemic uncertainty, reasoning about uncertainty 
and combining distinct sources of evidence. The answer of 
the problem under concern is assumed to belong to a known 
finite discrete frame of discernement (FoD) 0 = {61,...,0n} 
where all elements (i.e. members) of © are exhaustive and 
exclusive. The set of all subsets of © (including empty set 
(), and ©) is the power-set of © denoted by 2°. The number 
of elements (i.e. the cardinality) of the power-set is 2!°l. A 
(normalized) basic belief assignment (BBA) associated with a 
given source of evidence is a mapping m°(-) : 2° — [0,1] 
such that m°(0) = 0 and > y¢56 m9(X) = 1. ABBA m9(-) 
characterizes a source of evidence related with a FoD O. For 
notation shorthand, we can omit the superscript 6 in m°(-) 
notation if there is no ambiguity on the FoD we work with. 
The quantity m(X) is called the mass of belief of X. X € 2° 
is called a focal element (FE) of m/(-) if m(X) > 0. The 
set of all focal elements of m/(-) is denoted! by Fe(m) + 
{X € 2°|m(X) > 0}. The belief and the plausibility of X 
are respectively defined for any X € 2° by [5] 


Bel(X)= So m(¥) (1) 
Ye2°|YCX 
PUX)= S>  m(¥)=1-Bel(X). (2) 


YE29|XNYAO 
where X £ © \ {X} is the complement of X in 0. 


'S means equal by definition. 
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One has always 0 < Bel(X) < PI(X) <1, see [5]. For 
X=, Bel(0) = PI) =0, and for X =O one has 
Bel(O) = PI(O) =1. Bel(X) and PI(X) are often inter- 
preted as the lower and upper bounds of unknown prob- 
ability P(X) of X, that is Bel(X) < P(X) < PI(X). 
To quantify the uncertainty (i.e. the imprecision) of 
P(X) € [Bel(X), PI(X)], we use u(X) € [0,1] defined by 


u(X) & Pl(X) — Bel(X) (3) 


The quantity u(X) = 0 if Bel(X) = Pl(X) which means that 
P(X) is known precisely, and one has P(X) = Bel(X) = 
PI(X). One has u(@)=0 because Bel(0) = Pl(d) = 0, 
and one has u(@) =O because Bel(O) = PI(O) = 1. If 
all focal elements of m/(-) are singletons of 2° the 
BBA m(-) is a Bayesian BBA because VX € 2° one has 
Bel(X) = PI(X) = P(X) and u(X) = 0. Hence the belief 
and plausibility of X coincide with a probability measure 
P(X) defined on the FoD ©. The vacuous BBA characterizing 
a totally ignorant source of evidence is defined by m,(X) = 1 
for X = O, and m,(X) = 0 for all X € 2®© different of ©. 
This very particular BBA plays a major role in the establish- 
ment of a new effective measure of uncertainty for BBA. 


III. ESSENTIAL DESIDERATA FOR A MOU 


Before defining our new effective measure of uncertainty, 
denoted by U(m), for any basic belief assignment m/(-) related 
to a (non-empty) FoD 0, we present the four essential and very 
natural desiderata that an effective MoU must satisfy [7]. 

Desideratum D1: For any non-empty frame of discernment 
© and for any BBA m(-) focused on a singleton X of 2° one 
must have 


U(m) =0 (4) 


Justification of D1: this desideratum is natural and intuitive 
because any particular BBA for which a singleton X has 
m(X) = 1 characterizes its certainty, which means that there 
is no uncertainty about the choice of this element since it does 
not include other smaller element in it. So, in this case U(m) 
must take zero value. 

Desideratum D2: The measure of uncertainty of a total 
ignorant source of evidence must increase with the cardinality 
of the frame of discernment. That is 


U(m?) <U(mo), if |O| < |O’|. (5) 


Justification of D2: this second desideratum makes perfect 
sense because the total ignorant source of evidence on 
© = {61,...,9n} for which m9(@) =1 knows absolutely 
nothing about only N elements, whereas the total ignorant 
source of evidence on 0’ = {6),...,9N,On41,---, On} with 
m®' (@') = 1 knows absolutely nothing about more elements 
because N’ > N. This clearly indicates that me must be in 
fact more ignorant than m®. 

Desideratum D3: The measure of uncertainty U(m) must 
coincide with Shannon entropy [1]-[3] if the BBA m(-) is a 
Bayesian BBA. This desideratum is mathematically expressed 


for any Bayesian BBA m(-) defined on the FoD © by the 
condition” 


U(m) = — S° m(X)log(m(X)) (6) 
XEO 

Justification of D3: this third desideratum is also very natural 
because Shannon entropy is the most well-known and justified 
[9] measure used to characterize the uncertainty (the random- 
ness, or variability) of a probability mass function. Because 
any Bayesian BBA induces belief and plausibility functions 
that coincide with a probability measure, one must have a 
coherence of U(m) with Shannon entropy when the BBA is 
Bayesian. 


Desideratum D4: For any non-vacuous BBA m(-) and for 
the vacuous BBA m,(-) defined with respect to the same FoD 
one must have 

U(m) < U(m,) (7) 


Justification of D4: this last desideratum is also a very impor- 
tant one and it makes perfect sense because the total ignorant 
source is always characterized by the vacuous BBA m,(-), 
and obviously no source of evidence can be more uncertain 
than the total ignorant source. 


Effectiveness of a measure of uncertainty: A measure of 
uncertainty (MoU) is said effective if and only if it satisfies 
the four essential desiderata D1, D2, D3, and D4. 


Any MoU that fails to satisfy at least one of these four 
desiderata is said non-effective, and in this case it cannot 
be considered seriously as a good measure of uncertainty 
for characterizing a basic belief assignment of a source of 
evidence. Consequently, a non-effective MoU should not be 
used in applications involving MoU. 


As justified in [7], we voluntarily do not include the sub- 
additivity desideratum in the list of our desiderata for the 
search of an effective MoU in the belief function framework 
because this desideratum does not make sense when working 
with general (i.e. non-Bayesian) BBAs, and it is incompatible 
with the essential desideratum D4. We recall that the sub- 
additivity condition is defined by U(m®*®’) < U(m!®) + 
U(m*®’) or any joint BBA defined on the cartesian product 
© x ©! of FoDs © and ©’, where m*®© is the marginal (i.e. 
projection) of m®*®" (.) on the power-set 2°, and m*®" is the 
marginal (i.e. projection, see [10], [11]) of mex (.) on the 
power-set 2°". To justify our choice, just consider a simple 
example with |O| =5 and |©’| = 8, which means that the 
cartesian product space © x ©’ has |O x ©’| = 40 elements. 
Why the MoU of the vacuous BBA mere related to 40 
elements of © x ©’ should be less (or equal) to the sum 
of MoU of vacuous BBA m® related to only 5 elements of 
© and the MoU of the vacuous BBA me only related to 
the 8 elements of ©’? We do not see any solid theoretical 
reason, nor intuitive reason, for justifying and requiring the 


Shannon entropy [1] is given here in nats, and we take Olog(0) = 0 
because lim,,_,)+ 2 log(a«) = 0 which is proved using L’H6pital’s rule [4]. 
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subadditivity desideratum in the general framework of belief 
functions, and to select it as an axiom to satisfy in general as 
done in [12]. Unlike Vejnarova and Klir opinions [15] (p.28) 
(and some authors following them), we do not consider that 
the meaningful (or effective) measure of uncertainty of basic 
belief assignment must satisfy the sub-additivity desideratum 
in general. 


IV. EXISTING EFFECTIVE MOUS 


Before presenting our new effective MoU (or generalized 
entropy) in the next section, we must discuss a bit of the 
few existing effective measures of uncertainty proposed in 
the literature. As shown in [7], most? of existing MoUs are 
actually non-effective, and only eight MoUs can be considered 
as effective in the mathematical sense defined in the previous 
section. Most of effective MoUs share two basic principles: 
1) approximate the BBA m by a probability measure (i.e. a 
Bayesian BBA) P,,, based on some method of approximation 
and evaluate its Shannon entropy to estimate the randomness 
(or conflict) inherent to the BBA, and 2) add a term to Shannon 
entropy that characterizes the level of ambiguity (or non- 
specificity) inherent of the BBA (usually thanks to Dubois 
& Prade U-uncertainty [16]). For instance in [7] the BetP 
and DSmP transformations are used, in [17] the Cobb-Shenoy 
transformation [18] is used, and in [19] the authors suggest to 
use* the Bayesian BBA compatible with belief intervals drawn 
from m(-) that maximizes Shannon entropy. This general 
2-steps principle is rather simple and quite intuitive but it 
seriously lacks of theoretical justification. We consider that 
such type of effective MoU construction is conceptually flawed 
and not very satisfactory for two main reasons: 


Reason I: these effective MoUs highly depend on the 
method of approximation whose choice is quite arbitrary. 
Worse, a method of approximation of a BBA m/(-) to a 
Bayesian BBA can be totally misleading as for instance Cobb- 
Shenoy P/Pr,, transformation [18] because for this trans- 
formation the evaluation of probabilities can be inconsistent 
with belief interval values. More precisely, one can have 
PIPrm(6;) ¢ [Bel(0;), Pl(0;)| with Cobb-Shenoy method, 
which is obviously not reasonable, nor acceptable at all, see 
discussion in [7] with example. We emphasize the fact that 
if a method of approximation of a BBA m by a probability 
measure P,,, is chosen, it must be at least consistent with 
belief interval values generated by the BBA m under concern. 
Clearly, we cannot recommend Cobb-Shenoy transformation 
for building an effective MoU based on aforementioned prin- 
ciples 1) and 2) as proposed recently by JirouSek and Shenoy 
in [17]. 


Reason 2: In fact, there is no solid reason or evidence that 
necessitates to approximate any (non-Bayesian) BBA by 
a Bayesian BBA (for using Shannon entropy measure) in 
the construction of MoU. Also, there is no reason why 


Forty-eight MoUs have been analyzed in [7]. 
*found using a complicate optimization method, see [20], [21] for details. 


we need (or request) to make the distinction of the two 
aspects of uncertainty (conflict and non-specificity), and to 
consider them as additively separable. This is conceptually 
very disputable because the randomness (or conflict) and 
ambiguity (or non-specificity) are actually interwoven through 
the mass value of the focal elements of the BBA and their 
belief intervals. 


Very recently, Zhang et al. in [22] did propose three new 
effective MoUs not directly based on the aforementioned 2- 
steps principle approach, and that is why they have attracted 
our attention. These MoUs are denoted by H'(m), H?(m) 
and H*(m) and they are respectively defined by? 


H"(m) =— }7 m(X)log(PUX)) + }) m(X)2 loge (|X) 
xCoe xCO 

H?(m) =— }7 m(X)log(PUX)) + > m(X) loga(2!*! — 1) 
xCo xCoO 

H?(m) =— D7) m(X)loga(PUX)) + S7 m(X)|X| 
xco ze 


Unfortunately, Zhang et al. fail to capture well the interwoven 

link between conflict and non-specificity (or imprecision). 
Actually the authors set arbitrarily the range of their MoU as 
a simple parameter, either taken arbitrarily as [0,2 log,(|O|)], 
(0, logy (2'©! — 1)] or [0, |], to define their H1(m), H?(m) 
and H3(m) measures of uncertainty. Zhang’s approach is very 
questionable, and actually other ranges could have been chosen 
instead. Moreover Zhang et al. do not identify (nor propose) 
the best MoU to use between H'(m), H?(m) and H3(m). 
The other serious problem with Zhang’s approach is its lack 
of solid justification for using the plausibility function in the 
summation — ))yc6@m(X) logs(PI(X)). Although effective 
in the mathematical sense defined in section III, Zhang’s 
new MoUs are ill-justified and they can also be considered 
as conceptually flawed. That is why we present a better 
conceptual effective measure of uncertainty for BBA in the 
next section. 


V. A NEW EFFECTIVE MEASURE OF UNCERTAINTY 
A. Mathematical definition 


The new effective measure of uncertainty we propose is 
given by the following simple formula 


d= 3(X) (8) 


s(X) = —(1 — u(X))m(X) log(m(X)) 
+u(X)(1—m(X)) @) 
s(X) is the uncertainty contribution of X in the MoU U(m). 


We call s(X) the entropiece of X. Because u(X) € [0,1] 
and m(X) € [0,1] one has s(X) > 0, and U(m) > 0. The 


>We have corrected here the definition of H?(m) which is mathematically 
ill-formulated in [22]. 
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entropiece s(X ) takes into account the belief mass m(X), and 
the uncertainty (or imprecision) u(X) = PI(X) — Bel(X) 
about the unknown probability of X in a subtle interwoven 
manner. The cardinality of X enters indirectly (i.e. not ex- 
plicitly) in the derivations of Bel(X) and PI(X), and thus 
in the calculation of u(X) and in the entropiece s(X). The 
quantity —(1—u(X)) log(m(X)) = (1—u(X)) log(1/m(X)) 
entering in s(X) in (9) is the surprisal [8] log(1/m(X)) of 
X discounted by the confidence (1—wu(X)) one has about the 
precision of P(X). The term —m(X)(1 — u(X)) log(m(X)) 
is the weighted discounted surprisal of X. The second term 
u(X)(1 — m(X)) corresponds to the imprecision of P(X) 
discounted by (1 — m(X)) because the greater m(X) the 
less one should take into account the imprecision u(X) in 
the MoU. As we will prove next, this new very simple MoU 
U(m) satisfies the four essential desiderata, and thus it is 
effective and conceptually well justified, and it presents several 
advantages over existing effective MoUs given in Section VII. 

Because for X = §), one has m(@) = 0 and u() = 0 the 
entropiece of the empty set @ is s(@) = 0. Hence the expression 
of U(m) can be written equivalently as 


» 


XE2°|XA0 


5(X) = s(X) (10) 


» 


XE2°|XA0 


It is worth noting that for any BBA focused on X 4 9 
with m(X)=1, we have m(X) = Bel(X) = PI(X) =1, 
and thus u(X) = 0. In this case, the entropiece of X is® 


s(X) = —(1 — u(X))m(X) log(m(X)) + u(X)(1 — m(X)) 
= —(1—0)1 log(1) + 0(1— 1) =0 


In particular, if (©) = 1 (which corresponds to the vacuous 
BBA) we have the entropiece s(Q) = 0. 

U(m) is expressed in nats because we use the natural log- 
arithm which makes derivations simpler, specially for making 
some proofs in the sequel. U(m) can be expressed in bits by 
dividing the U(m) value in nats by log(2) = 0.69314718.... 
This measure of uncertainty U(m) is a continuous function in 
its basic belief mass arguments because it is a summation of 
continuous functions. 


B. Entropy of the vacuous BBA 


Consider the FoD © of cardinality |O| = N greater than 
zero, and the vacuous BBA m.,, defined on this FoD for 
which m,(©) =1 and m,(X) =0 for any X 4 O in 2°. 
For this vacuous BBA one always has Bel(O) = PI(O) = 1 
and thus u(O) = PI(O) — Bel(@) =0, and one has also 
u(@) = 0. For all elements X 4 © with X © 2° \ {0} one 
has also necessarily Bel(X)=0, PI(X)=1 and thus 


because log(1) = 0. 


u(X) = PI(X) — Bel(X) = 1. Consequently, the expression 
(10) with the BBA m, becomes’ 


U(m)=- SY) (1—u(X))my(X) log(m(X)) 
XE2°|XA0 
+ SJ uw(X)(1-m,(X)) 


XE29|XA0 
= —(1 — u(O))m,(O) log(m,(8)) 
= SS (1 — u(X))m,(X) log(m,(X)) 
X€2°|(XA0)A(X#O) 
+ u(®)(1 — my(9)) 


=| > 


XE2°|(XA0N)A(X4O) 


u(X)(1 — my(X))] 


In this expression of U(m,) we have® 
—(1 — u(@))m,(8) log(m,(@)) = —(1 — 0)1 log(1) = 0 
~ Dixe2e|(xz0)a(x4e) — u(X))my(X) log(m,(X)) = 0 
u(O)(1 — m,(O)) = 001 — 1) = 0 
Dx E2°|(XA0)A(X40) u(X)(1—m,(X)) = 2% — 2 


Therefore, it comes finally for the vacuous BBA m,, defined 
on a FoD of size N > 0 the following MoU value 


U(m,) = 2% —2 (11) 


The entropy U(m) makes perfect sense because for 
the vacuous BBA m,(-) there is no information about 
the conflicts between the elements of the FoD. One has 
u(Q) = 0 because [Bel(), P1(O)] = [0,0], u(O) = 0 because 
[Bel(©), P1(©)] = [1,1], and for all X € 2° \ {0,0} one 
has u(X) = 1 because [Bel(X), Pl(X)] = [0,1]. Hence, the 
sum of all imprecisions of P(X) for all X € 2° is exactly 
equal to 2% — 2 when |O| = N. In the degenerate case where 
|O| = N = 1, one has U(m,) = 2! — 2 = 0 which indicates 
that there is absolutely no uncertainty in this very particular 
case. This result makes perfect sense also. For non-degenerate 
FoD (i.e. when |O| > 1) one has always U(m,) > log(N) 
which means that the vacuous BBA representing the totally 
ignorant source of evidence has an entropy greater than the 
maximum of Shannon entropy log(V) obtained with the 
uniform probability mass function distributed on ©. This is 
an expected result because no BBA can represent the total 
ignorance, but the vacuous BBA. 


C. Effectiveness of U(m) 


In this subsection we establish the effectiveness of our new 
generalized entropy U(m) defined in (8). For this, we prove 
the following four lemmas. 


Lemma 1: U(m) satisfies the desideratum D1. 


Proof: Consider at first the very special case where 
© includes only one element 6, that is O = {0} and 
|O|=1. In this case there exists only one possible 


7The notation a A b means that the conditions a and b are both satisfied. 
8For X 4 O, my(X) = 0 and my(X) log(my(X)) = 0log(0) = 0. 
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BBA over 2°={,0} defined by m(0)=0 and 
m(@)=1. Hence Bel(0) = PI(0)=1, u(@)=0, and 
s(6) = (1 — u(@))m(0) log(m(6)) + u(@)(1 — m(@)) = 0. 


Therefore U(m) = s(0) + s(0) =0. In more general (i.e. 
when |O| > 1) if X is a singleton of 2° (i.e. |X| = 1) and 
if m(X)=1 then Bel(X) = PI(X)=1 and u(X)=0. 
For the elements Y of 2° \ {@} containing X one has 
also Bel(Y) = PI(Y) =1 and therefore u(Y) = 0. For all 
elements Y of 2° \ {} not containing X one has always 
Bel(Y) = PI(Y) =0 and therefore u(Y) = 0. In summary, 
one has: 1) m(X) = 1, u(X) = 0, s(X) =0, 2) m(Y) = 0, 
u(Y) =0, s(Y)=0 for all YAX, Y € 2°\ {O}, and 
3) s(@)=0. Applying formula (8) (or (10)) we obtain 
U(m) = 0, which completes the proof of lemma 1. 


Lemma 2: U(m) satisfies the desideratum D2. 


Proof: Consider two FoD © et ©’ with |O|=N and 
|O’| = N’ greater than zero, and suppose N < N’. For 
the vacuous BBA me defined on the FoD ©, one has 
U(m®) = 2% — 2. Similarly, for the vacuous BBA m®” de- 
fined on the FoD ©’, one has U(m®’) = 2%’ — 2. Because 
the exponential function is an increasing function, one has 
always ON Qn" and also 2% —2 <2N’ —2. Therefore 
U(m®) < U(m®") when || <|0’|, which completes the 
proof of lemma 2. 


Lemma 3: U(m) satisfies the desideratum D3. 


Proof: When the BBA m is Bayesian, its focal elements are 
only singletons of 2° and Bel(X) = PI(X) for all X € 2°. 
Hence u(X) = 0 for all X € 2°. Thus, in the expression (9) 
of s(X) one has always —(1 — u(X))m(X) log(m(X)) = 
—m(X) log(m(X)) and u(X)(1— m(X)) = 0(1—m(X)) = 
0, so that s(X) = —m(X) log(m(X)). Therefore U(m) = 
YVexese 8(X) = —DVixese m(X) log(m(X)). Because the 
masses of all non-singleton elements of 2° are zero, we 
finally obtain U(m) = ~ Lixe2°||x|=1 m(X) log(m(X)) = 
—Vxvce MX) log(m(X)), and this is Shannon entropy. This 
completes the proof of lemma 3. 


Lemma 4: U(m) satisfies the desideratum D4. 


Proof: see the appendix. 


Theorem: U(m) is an effective measure of uncertainty of a 
basic belief assignment. 


Proof: Because U(m) satisfies all desiderata D1, D2, D3, and 
D4 as proved in lemmas 1-4, the measure of uncertainty U(m) 
defined in (8) is effective. 


D. Remarks about U(m) 


Remark 1: It is worth noting that we do not have specified 
a priori what should be the range of an effective MoU in 
contrary to some axiomatic attempts made by different authors 
as reported, for instance, in [12]. We consider that the choice 
of the range must not be chosen a priori. The maximum range 
must result of the effective MoU mathematical definition. We 


only request the satisfaction of the desideratum D4, which is 
much more general, natural and essential. 

Remark 2: The choice of the desideratum D3 (compatibility 
with Shannon probabilistic entropy) could be disputed be- 
cause other entropy definitions and generalizations exist in 
the probabilistic framework (as those defined by Rényi [13], 
Tsallis [14], etc). We think however that Shannon entropy is 
still the most used and preferred one for engineers working 
in information fusion. The measure of uncertainty U(m) 
presented in this paper could be (hopefully) generalized by 
replacing the desideratum D3 by another one using another 
choice of generalized entropy definition, which would obvi- 
ously necessitate a modification of the definition of U(m). 
This theoretical question has not yet been explored, and is left 
for future research. 

Remark 3: It can be proved? that U(m) verifies the mono- 
tonicity property. More precisely, if my and mz are two 
distinct BBAs defined on the same FoD © and respec- 
tively focused on Y and on Z in 2°, then one has always 
U(my) < U(mz) if |Y| < |Z]. As a special case, one has 
U(my) < U(mz) if Y Cc Z. 

Remark 4: Consider a BBA m® defined on a FoD @. Its 
zero-extension m® on a FoD @’ including © (i.e. O C 0’) 
is defined by m®'(X) = 0 for all X € 2® not included in 
2°, and m® (X) = m®(X) for all X € 2°. It means that 
[Bel(6;), Pl(9;)] = [0, 0] for all 6; € ©'\ ©. Under this condi- 
tion, one has always U(m®) < U(m®) because u® (X) > 0 
if X NY #0 for some Y € 2°. Hence there exists at least 
an extra term 8° (X) ) > 0 entering in U(m®) calculation 
(w.r.t. U(m®)) if m®° 4 m®’. Therefore, the extendability 
property of Shannon entropy for probability measures must 
be extended as U(m®) < U(m®’) for (non-Bayesian) basic 
belief assignments. The equality U(m°) = U(m®’) holds 
if m® is a Bayesian BBA because U(m®) coincides with 
Shannon entropy in this case. 


VI. EXAMPLES 


In this section we give several simple numerical examples 
of the value of the measure of uncertainty U(m) expressed in 
nats. The examples are given in Table I and they correspond 
to different BBAs m; (¢@ = 1,2,...,6), and to the vacuous 
BBA m, defined on a FoD ©. For = 2, we have only 
one possible union/disjunction 6; U 92 in 2° which makes 
the examples too simple and not very interesting. Because for 
|| > 4 we have 24 = 16 elements of 2° to list, and due to 
paper length restriction we just give here some examples for 
|O| = 3 with 0 = {61, 02, 3}. 

The numerical values of U(m) have been truncated to their 
third decimal. m , and mz are Bayesian BBAs, and mz is the 
uniform Bayesian BBA. Hence we have U(mz2) = log(|@|) = 
log(3) = 1.098 which is the maximum of Shannon entropy for 
this FoD. The BBAs mz, ..., mg and m, are non-Bayesian 


Sketch of proof: prove that U(my) = 2/91 —1—|{x © 2°|y Cc 
X}|-—|{X € 2°|X NY = O}| and U(mz) = 2/81 —1-|{X € 2°|Z7 Cc 
X}|—|{X € 2°|X nN Z = O}, and compare U(my-) and U(mz) when 
|Y | < |Z| to complete the proof. 
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BBAs, and U(m,) = 2° — 2 = 6 is the maximum value of 
the new proposed generalized entropy. 


0 
0 
0 
0 
0 
0 
0 
1 


is) 0 0 0 0.3 0 
35 | 4 [6 
Table I 
EXAMPLES FOR U(m;), i = 1,2...,6 AND U(my). 


It is worth noting that a non-Bayesian BBA m can have an 
entropy value U(m) smaller than the maximum of Shannon 
entropy, which is normal and not surprising. For instance, if 
we consider 0 = {61,02,63} and the BBA m(@,) = 0.1, 
m(62) = 0.8 and m(6; U 62) = 0.1, we get U(m) = 0.909 
which is smaller than log(|O|) = log(3) + 1.098. Therefore, 
the condition U(m) < log(||) does not imply that the BBA 
m is necessarily a Bayesian BBA, but if U(m) > log(|O|) we 
are sure that m is a non-Bayesian BBA. We recall also that 
any BBA focused on a singleton has always zero uncertainty 
because lemma | holds. 


Abellan and Moral’s example revisited 


We revisit Abellan and Moral’s example [23] with the FoD 
O = {61, 02,03} and the BBAs m/(-) and m’(-) defined by 


Abellan and Moral’s intuitively think it is reasonable that m 
should represent more uncertainty than m’ as m is completely 
symmetrical and m’ points to 02 U 63. We disagree with 
this intuition because the authors did not take into account 
the changes of masses values between m and m’, nor the 
imprecisions of all unknown probabilities P(X) generated by 
m, and the imprecisions of P’(X) generated by m’. 

If we analyze more carefully these two basic belief assign- 
ments we get the belief intervals [Bel(X), Pl(X)]| based on 
m, and the belief intervals [Bel’(X), Pl’/(X)] based on m’ 
listed in Table I. Based on the belief interval values listed in 
Table II, it is clear that m’ generates in fact globally more 
uncertainty (imprecisions on probabilities of elements of the 
power set of Q) than m if we compare u(X) and u’(X) values. 
If we apply our new effective MoU definition, we obtain 
U(m) = 3.1059 nats, and U(m’) = 3.3384 nats. One sees 
that U(m) < U(m’), which well reflects that m’ is actually a 
bit more uncertain than m, contrary to what one would expect 
based on an incorrect intuition. This simple example is very 
interesting because it shows clearly how a simplistic intuition 
can easily fail. 


(BalXY, PRXY] | aX) Bel), PPO) 
() [0,0] 0 [0,0] 0 

[0.161,0.361] 

[0.161,0.678] 

[0.322,0.839] 

[0.161,0.678] 

[0.322,0.839] 

[0.639,0.839] 
[1,1] 


[0.2.0.6] 
[0.2,0.6] 
[0.4,0.8] 


[0.2,0.6] 

[0.4,0.8] 

[0.4,0.8] 
(1,1) 


Table II 
BELIEF INTERVALS DRAWN FROM m AND ™’. 


Entropic surface for all BBAs m/(.) defined on O = {61,02} 
The figure 1 shows the entropic surface corresponding to 


U(m) when m(61) € [0,1], m(@2) € [0,1] such that m(@,) + 
m(02) < ile and with m(O, U 02) =1- m(61) = m(62). 


Entropy U(m) in nats for |@|=2 


Figure 1. Entropy value U(m) for all m(.) defined on © = {6;, 0}. 


One verifies visually that U(m) surface is smooth. Its border 
in the vertical plane passing through the points (m(@1) = 
1,m(02) = 0) and (m(6,) = 0,m(62) = 1) corresponds to 
Shannon entropy curve whose maximum value is log(2) ~ 
0.6931, which is what we naturally expect. The unique max- 
imum value of U(m) is for the vacuous BBA m,, and it is 
Lit) = 2) 2 = 2. 


VII. CONCLUSION 


In this paper we have presented a new effective measure of 
uncertainty for basic belief assignments which is conceptually 
better justified than the few existing effective measures defined 
so far. This new generalized entropy measure verifies all the 
four very natural and essential desiderata, and presents the 
main advantages of simplicity, continuity, monotonicity and 
it also responds to the change of dimension of the frame of 
discernment. It is based on the interwoven link between the 
randomness and the imprecision of unknown probabilities of 
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all elements of the power set of the frame of discernment 
which is inherent to any basic belief assignment. 

This new entropy measure makes a clear distinction be- 
tween the maximum uncertainty of the vacuous BBA, and the 
uncertainties related to all non-vacuous BBAs, in particular 
with respect to Bayesian BBAs. Hence, we have answered 
positively to the challenging question about the existence 
of a better conceptual effective measure of uncertainty for 
BBAs. We hope that this new effective entropy measure will 
arouse the interest of users of belief functions who need an 
effective entropy measure in their own applications. It is worth 
mentioning that a dual of this new measure of entropy can be 
defined to characterize the information content of any BBA, 
as well as the notion of information gain and information 
loss between two BBAs. This will be reported in a future 
publication. 

As a first perspective of this theoretical work, this new 
entropy measure could be useful to develop advanced methods 
for performance evaluation of information fusion techniques, 
and for reasoning under uncertainty using the belief functions. 
As a second perspective, this new entropy could also serve to 
measure the uncertainty of quantitative possibility measures 
in the possibility theory because any quantitative possibility 
measure is a special case of a plausibility function which is 
one-to-one with a consonant belief mass function (i.e. a BBA 
having nested focal elements). 
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APPENDIX 


Proof of Lemma 4 

We first note from the expression (9) of s(X) that we always 
have s(X) = u(X) for X € 2° and X £0 if m(X) =0. We 
have also s(X) = 0 for X € 2° and X £4 O if m(X) =1. 
For X € 2° and X #90, if 0 < m(X) <1 one has 


s(X) = —(1 — u(X))m(X) log(m(X)) + u(X)(1 — m(X)) 


1 
= (1—u(X))m(X) OD + u(X)(1 — m(X)) 
tt BENE —1)+u(X)1— m(X)) 


This strict inequality comes from the fact that for any real 
number x > 0 with x # 1, the strict inequality log(z) < «—1 
holds! (see [24], p. 68). Because (1 — u(X))m(X) (aay = 
1) + u(X)(1 — m(X)) = 1 — m(X), one has finally the 
following inequality 


s(X) < 1—m(X) (12) 


!Obecause the derivative f’(a) of f(a) = 2—1—log(a) is always positive 
for x > 0 except for z = 1 where f’(1) = 0. 


To prove that U(m) < U(m,), we consider all the cases for 
the distribution of the belief masses in the BBA m 4 my, as 
follows: 


Case 1: 0 < m(X) <1 for all X £0 of 29. 
In this (most general) case we have 


a a 


XE2°|XAO XE29|XA0 


U(m) = s(X) < (1 — m(X)) 


The majorant >? y¢90)49(1 — m(X)) can be written as 


So 1-m(xX)= YO 1- m(X) 
XE2°|XA0 XE2°|XAO XE2°|XA0 
Because one has Dixe2°|x40 1=2N -1, and 


Dix e2°| x40 m(X) = 1, the majorant is given by 


1—m(X)=2" -1-1=2"N_9 
XE29|XA0 


This majorant corresponds exactly to U(m,), therefore we 
have proved that 


U(m) < U(m,) (13) 


when 0 < m(X) <1 for all X £0 of 2°. 


Case 2: Consider the particular BBA for which m(X) = 1 

for some X 4 ( and X ¥ O in 2°. 

« If X is a singleton of 2° then Bel(X) = PI(X) = 1 
and u(X) = 0. For the elements Y of 2° including X 
one has Bel(Y) = PI(Y) = 1 and thus u(Y) = 0. 
for the elements Y of 2° not including X one always 
has Bel(Y) = PI(Y) = 0 and thus u(Y) = 0. Hence, 
m(X) = 1, u(X) = 0, s(X) = 0, and also m(Y) = 0, 
u(Y) = 0, s(Y) = 0 for all Y A X. Therefore we get 
U(m) = 0 which is smaller than U(m,) = 2% — 2, ie. 
U(m) < U(my,) in this case. 

e If X is not a singleton of 2° and if m(X) = 1 then 
Bel(X) = Pl(X) = 1, u(X) = 0 and s(X) = 0. 
We have also s(Q) = 0 because m(O) = 0, and we 
have u(Q) = 0 because Bel(Q) = PI(O) = 1. For all 
Y £0, Y #4 X and Y F¥ O such that XNY = 9, 
we always have u(Y) = 0 because Bel(Y) = O and 
PI(Y) = 0. For all Y 4 0, Y #4 X and Y ¥ O such 
that X 1 Y # O, we always have u(Y) = 1 because 
Bel(Y) = 0 and PI(Y) = m(X) = 1 because X 
has a non-empty intersection with Y. Consequently, the 
expression of U(m) can be reformulated as 


U(m) = 5(0) + 5(X) + 50) 
+ > s(Y) 
YE2°\{O, X,O}|/ YNX=0 
= 3 
Y€2°\{0,X,E}|/Y¥NX40 
We have s(0) + s(X) + s(O) = 0 because s() = 0, 
s(Q) = O and s(X) = 0 when m(X) = 1. For 
Y € 2° \ {0,X,0} such that YM X = @, we have 


s(Y) (14) 
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u(Y) = 0 and m(Y) = 
u(¥))m(¥) log(m(Y)) + 
0)0 log(0) + 0(1 — 0) = 


a 


Y€2°\{0,X,O}YNX=0 


For Y € 2° \ {@,X,@} such that YN xX #4 O 
we have u(Y) = 1 and m(Y) 0, hence s(Y) = 
—(1 = u(¥))m(¥) log(m(Y)) + u(Y) A — m(¥)) = 
(1 — 1)0log(0) + 1(1 — 0) = 1. Consequently, 


.> i 2o" =o 
YE2°\{0,X,O}|YNXFZO 


Therefore, if a BBA is focused on any element X # O 
(singleton, or not), that is if m(X) = 1, we have proved that 
the strict inequality U(m) < U(m,) always holds. 


0, hence s(Y) = 


u(¥)(1 — m(Y)) = (i — 
0. Consequently 


s(Y) =0 


Case 3: Some elements of the BBA have at least a zero 
mass value, and others have some strictly positive mass values 
strictly smaller than 1. 

The measure of uncertainty U(m) defined in (10) requires 
2N — 1 terms s(X) to calculate in general (ic. when all 
X € 2° \ {} are focal elements of m). If some elements 
X have zero mass value, this measure U(m) can always be 
decomposed as 

De 


X E29 |(XA0)A(m(X)=0) 


s(X) 


+ a s(X) (15) 
X E29 |(XAD)A(0<m(X)<1) 
Because one has s(X) = u(X) when m(X) = 0, the first 


summation of (15) is equal to )) yc90 (x4) a(m(xX)=0) UX 
Because u(X) <1, and s(X) < 1—m(X) when m(X) < 


one has the following strict inequality that holds 


U(m) < x i 
) 


XE2°|(XA0)A(m(X)= 


+ a 


XE2°|(XAD)A(O<m(X)<1) 


We can have at most 2" —3 elements of 2°\{} having a mass 
equal to zero because we must have at least (2% — 1) — (2% — 
3) = 2 elements X, and X_ of 2° for which 0 < m(X.) < 1, 
0 < m(X2) < 1 with m(X1) + m(X2) = 1. If we assume 
that there are 1 < M < 2% —3 elements of 2° \ {0} that have 
zero mass value, then there exist K = 2% — 1 — M elements 
X1,Xo,...,XK of 2° \ {0} for which 0 < m(X;) < 1, 
k=1,...,K and with ‘Sama m(X;) = 1. Hence, 


(2N—-1)—-M 


> 


k=1 


U(m)<M+ (1 — m(X;,)) 


or equivalently, 


U(m) <M+(2N -1)-M 


Hence, U(m) < 2% — 2, and consequently we have U(m) < 
U(my) because U(m,) = 2% — 2. 


In summary, we have examined all possible cases for the 
distribution of the belief masses, and we have proved that we 
always have the strict inequality U(m) < U(m,) satisfied. 
This completes the proof of the Lemma 4. 
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Abstract—In this paper, we present a method to solve ana- 
lytically the simplest Entropiece Inversion Problem (EIP). This 
theoretical problem consists in finding a method to calculate a 
Basic Belief Assignment (BBA) from the knowledge of a given 
entropiece vector which quantifies effectively the measure of 
uncertainty of a BBA in the framework of the theory of belief 
functions. We give an example of the calculation of EIP solution 
for a simple EIP case, and we show the difficulty to establish the 
explicit general solution of this theoretical problem that involves 
transcendental Lambert’s functions. 


Keywords: belief functions, entropy, measure of uncertainty. 


I. INTRODUCTION 

In this paper, we suppose the reader to be familiar with the 
theory of Belief Functions (BF) introduced by Shafer in [1], 
and we do not present in details the basics of BF. We just recall 
that a frame of discernement (FoD) 0 = {61,60,...,4n} isa 
finite exhaustive set of N > 1 mutually exclusive elements 6; 
(1 =1,...,N), and its power set (i.e. the set of all subsets) is 
denoted by 2°. A FoD represents a set of potential solutions 
of a decision-making problem under consideration. A Basic 
Belief Assignment (BBA)! is a mapping m : 2° > [0,1] with 
m(O) = 0, and Do xege M(X) = 1. 

A new effective entropy measure U(m) for any BBA m(-) 
defined on a FoD © has been defined as follows [2]: 


U(m) = S> s(X), (1) 
XE2° 
where s(X) is named the entropiece of X, which is defined 
by 
s(X) = —m(X)(1 — u(X)) log(m(X)) 
+u(X)(1—m(X)), (2) 
with 
u(X) = PI(X) — Bel(X) 


= SY my)- 


YE29|XNYAO 


mY). (3) 


oy 


Ye2°|¥YCX 


'For notation convenience, we denote by m or m(-) any BBA defined 
implicitly on the FoD ©, and we also denote it as m® to explicitly refer to 
the FoD when necessary. 


PI(X) and Bel(X) are respectively the plausibility and the 
belief of the element X of the power set of 0, see [1] 
for details. u(X) quantifies the imprecision of the unknown 
probability of X. The vacuous BBA characterizing the total 
ignorant source of evidence is denoted by m,, and it is such 
that m,(O) = 1 and m,(X) = 0 for any X CO. 

This measure of uncertainty U(m) (i.e. entropy measure) 
is effective because it satisfies the following four essential 
properties [2]: 

1) U(m) = 0 for any BBA m(-) focused on a singleton X 

of 2°; 

2) U(m®) < U(m®’) if || < ||; 

3) U(m) = —Vixee M(X) log(m(X)) if m(-) is a 

Bayesian? BBA. Hence, U(m) reduces to Shannon 
entropy [7] in this case; 

4) U(m) < U(m,) for any non-vacuous BBA m(-) and for 

the vacuous BBA m,(-) defined with respect to the same 
FoD. 


The proof of the three first properties is quite simple to make, 
whereas the proof of U(m) < U(m,) is much more difficult, 
see [2] for proofs and examples. A detailed analysis of other 
(non-effective) entropy measures proposed in the literature 
during the last four decades is done in [3]. 

The entropiece s(X) given by (2) corresponds to the 
contribution of X to the whole uncertainty measure U(m). 
The entropiece s(X) involves m(X) and the imprecision 
u(X) = PI(X) — Bel(X) about the unknown probability of 
X ina subtle interwoven manner named epistemic entangle- 
ment. The cardinality of X is indirectly taken into account 
in the derivation of s(X) thanks to u(X) which requires the 
derivation of Pl(X) and Bel(X) functions that depend on the 
cardinality of X. Because u(X) € [0,1] and m(X) € [0,1] 
one has s(X)>0, and U(m) > 0. The quantity U(m) is 
expressed in nats because we use the natural logarithm. U (m) 
can be expressed in bits by dividing the U(m) value in 
nats by log(2) = 0.69314718.... This measure of uncertainty 
U(m) is a continuous function in its basic belief mass 
arguments because it is a summation of continuous func- 


2m is Bayesian BBA if it has only singletons as focal elements, ice. 
m(O;) > 0 for some 6; € © and m(X) = 0 for all non-singletons X 
of 2°. 
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tions. In formula (2), we always take m(X) log(m(X)) = 0 
when m(X ) = 0 because lim,,(x)-,9+ m(X) log(¢m(X)) = 0 
which can be proved using L’H6pital rule [4]. Note that for 
any BBA m, one has always s(@) = 0 because m(@)) = 0 
and u(@) = Pl(d) — Bel(0) = 0-0 = O. For the 
vacuous BBA, one has s(©) =0 because m,(Q) = 1 and 
u(Q) = PI(O) — Bel(O) =1-—1=0. 

As proved in [2], the entropy of the vacuous BBA on the 
FoD 0 is equal to 


U(m,) = 2/9! — 2. (4) 


This maximum entropy value 2'°! — 2 makes perfectly sense 
because for the vacuous BBA there is no information at 
all about the conflicts between the elements of the FoD. 
Actually for all X € 2°\ {0,0} one has u(X)=1 be- 
cause [Bel(X), PI(X)| = [0,1], and one has u(@) =0 and 
u(Q) = 0. Hence, the sum of all imprecisions of P(X) 
for all X € 2° is exactly equal to 2!©! —2 which corre- 
sponds to U(m,) as expected. Moreover, one has always 
U(my) > log(|O|) which means that the vacuous BBA has 
always an entropy greater than the maximum of Shannon 
entropy log(|Q|) obtained with the uniform probability mass 
function distributed on ©. As a dual concept of this entropy 
measure U(m), we have defined in [8] the measure of infor- 
mation content of any BBA by 


IC(m) = U(my) — U(m) = (2!°! — 2) — S© s(X). (5) 


xXE2° 


From the definition (5), one sees that for m 4 me one has 
IC(m) > 0 because U(m) < U(m,), and for m = m, one 
has [C(m,) = 0 (i.e. the vacuous BBA carries no informa- 
tion), which is what we naturally expect. 

Note that the information content [C(m®) of a BBA 
depends not only of the BBA m/(-) itself but also on the 
cardinality of the frame of discernment O because [C'(m) 
requires the knowledge of |O|=N to calculate the max 
entropy value U(m,) = 2!©! — 2 entering in (5). This remark 
is important to understand that even if two BBAs (defined on 
different FoDs) focus entirely on a same focal element, their 
information contents are necessarily different. This means that 
the information content depends on the context of the problem, 
i.e. the FoD. The notions of information gain and information 
loss between two BBAs are also mathematically defined in [8] 
for readers interested in this topic. 

This paper is organized as follows. Section 2 defines the 
general entropiece inversion problem (EIP). Section 3 de- 
scribes the simplest entropiece inversion problem (SEIP). An 
analytical solution of SEIP is proposed and it is applied on a 
simple example also in Section 3. The conclusion is made in 
Section 4. 


II. GENERAL ENTROPIECE INVERSION PROBLEM (EIP) 

The set {s(X), X € 2°} of the entropieces values s(X) 
given by (2) can be represented by an entropiece vector 
s(m) = [s(X), X € 2°]", where any order of elements X 
of the power set 2° can be chosen. For simplicity, we 


suggest to use the classical N-bits representation if |O| = N, 
with the increasing order (see example in Section 3). The 
general Entropiece Inversion Problem, or EIP for short, is an 
interesting theoretical problem which can be easily stated as 
follows: 

Suppose that if the entropiece vector s(m) known (estimated 
or given), is it possible to calculate a BBA m(-) corresponding 
to this entropiece vector s(m)? and how? 

Also we would like to know if the derivation of m/(-) from 
s(m) provides a unique BBA solution, or not? 

This general entropiece inversion problem is a challenging 
mathematical problem, and we do not know if a general 
analytical solution of EIP is possible, or not. We leave it as an 
open mathematical question for future research. However, we 
present in this paper the analytical solution for the simplest 
case where the FoD © has only two elements, i.e. when 
|O| = N = 2. Even in this simplest case, the EIP solution is 
no so easy to calculate as it will be shown in the next section. 
This is the main contribution of this paper. 

The mathematical EIP addressed in this paper is not related 
(for now) to any problem for the natural world and it cannot 
be confirmed experimentally using data from nature because 
the entropy concept is not directly measurable, but only 
computable from the estimation of probability p(-) or belief 
mass functions m(-). So, why do we address this entropiece 
inversion problem? Because in advanced information fusion 
systems we can imagine to have potentially access to this 
type of information and it makes sense to assess the underlying 
BBA provided by a source of evidence to eventually modify it 
in some fusion systems for some aims. We could also imagine 
to make adjustments of entropieces values to volontarly im- 
prove (or degrade) [C'(m), and to generate the proper modified 
BBA for some tasks. At this early stage of research work it is 
difficult to anticipate the practical interests of the calculation 
of solutions of the general EIP, but to present its mathematical 
interest for now. 


II. SIMPLEST ENTROPIECE INVERSION PROBLEM (SEIP) 


A. Example 


We consider a FoD © with only two elements, say 
0 ={A,B}, where A and B are mutually exclusive and 
exhaustive, and the following BBA 


m(A) =0.5, m(B)=0.3, m(AUB) =0.2. 
Because [Bel(), Pl(@)| = [0,1] one has u(M) = 0. Because 
[Bel(A), PI(A)] = [0.5,0.7], [Bel(B), Pl(B)] = (0.3, 0.5], 
[Bel(O), PI(O)] = [1,1], one has u(A) = 0.2, u(B) = 0.2, 


and u(@) = 0. Applying (2), one gets s(0) = 0, s(A) = 
0.377258, s(B) = 0.428953 and s(O) & 0.321887. Using 
the 2-bits representation with increasing ordering’, we encode 


3Once the binary values are converted into their digit value with the most 
significant bit on the left (i.e the least significant bit on the right). 
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the elements of the power set as @ = 00, A = 01, B = 10 
and AU B = 11. The entropiece vector is 


(0) 0 
6, | (A) | _ [0.3773 
BU s(B) | ~ |0.4290| ° (6) 
s(AU B) 0.3219 


If we use the classical 2-bits (here |O| = 2) representation 
with increasing ordering (as we recommand) the first compo- 
nent of entropiece vector s(m) will be s(@) which is always 
equal to zero for any BBA m, hence the first component of 
s(m) is always zero and it can be dropped (i.e. removed 
of the vector representation actually). By summing all the 
components of the entropiece vector s(m) we obtain the 
entropy U(m) = 1.128098 nats of the BBA m/(-). Note that 
the components s(X) (for X 4 Q) of the entropieces vector 
s(m) are not independent because they are linked to each other 
through the calculation of Bel(X) and Pl(X) values entering 
in u(X). 


B. Analytical solution of SEIP 


Because we suppose 0 = {A, B}, the expression of three 
last components* of the entropiece vector s(m) are given by 
(2), and we have 


s(A) = —m(A . — u(A)) log(m(A)) 
1—m/(A)), 


+ u(B) 


s(AU B)=— Br eigen (AU B)) 
+-+u(AU B)(1—m(AU B)). 
Because u(A) = Pl(A) — Bel(A) = (m(A) + m(AU B)) — 
m(A) = m(AU B), u(B) = PI(B) — Bel(B) = (m(B) + 


m(AU B)) —m(B) = m(AU B) and u(AU B) = PIAU 
B)— Bel(AUB) =1—1=0, one gets the following system 
of equations to solve 


s(A) = —m(A)(1 — m(AU B)) log(m(A)) 
+m(AUB)(1—m(A)), 

s(B) = —m(B)(1 — m(AU B)) log(m(B)) 
+m(AUB)(1—m(B)), 8) 
s(AU B) = —m(AU B) log(m(AU B)). (9) 


The set of equations (7), (8) and (9) is called the EIP 
transcendental equation system for the case |O| = 2. 

The plot of function s(A U B) = —m(AU B) log(m 
is given in Figure | for convenience. By derivating the function 
—m(AU B)log(m(A U B)) we see that its maximum value 
is obtained for m(A U B) = 1/e © 0.3679 for which 


—=log(1/e) = *log(e) ae 


Therefore, the numerical value of s(AU B) always belongs to 
the interval [0, 1/e]. 


s(AUB) = 


4We always omit the Ist component s(() of entropiece vector s(m) which 
is always equal to zero and not necessary in our analysis. 


(AU B)) 


—0.4 ___| (0.3679, 0.3679) 
02 
0 ol2 ala <3 + - 
Figure 1. Plot of s(AU B) = —m(AU B) log(m(AU B)) (in red) with 


a-axis equals m(A U B) € [0,1], and y-axis equals s(A U B) in nats. 


Without loss of generality, we assume 0 < s(AUB) < 1/e 
because if s(A U B) = 0 then one deduces directly without 
ambiguity that either m(AU B) = 1 (which means that the 
BBA m/(-) is the vacuous BBA) if s(A) = s(B) = 1, or 
m(AUB) = 0 otherwise. With the assumption 0 < s(AUB) < 
1/e, the equation (9) is of the general transcendental form 


ye’ =a > log(m(AU B))m(AU B) = —s(AUB). (10) 


by considering the known value as a = —s(AUB) in [—+,0), 
and the unknown as y = log(m(AU B)). 


Unfortunately the solution of the transcendental equation 
(10) does not have an explicit expression involving simple 
functions. Actually, the solution of this equation is actually 
given by the Lambert’s W-function which is a multivalued 
function (called also the omega function or product logarithm 
in mathematics) [6]. It can however be calculated? with a good 
precision by some numerical methods - see [5] for details. 
The equation ye” = a admits real solution(s) only if a > 
—4, For a > 0, the solution of yeY = a is y = Wo(a), 
and for —t < a < 0 there are two possible real values of 
W/(a) - see Figure 1 of [5] which are denoted respectively 
yi = Wo(a) and yz = W_,(a). The principal branch of the 
Lambert’s function W(x) satisfying —1 < W(x) is denoted 
Wo(a), and the branch satisfying W(x) < —1 is denoted by 
W_1.(a) by Corless et al. in [5]. In our context because we 
have a € [—+, 0), the solutions of ye” = a are given by 


y1 = Wo(a) = Wo(-s(AU B)), 
= W_1(a) = W_1(—s(AU B)). 


Hence we get two possible solutions for the value of m(AU 
B), which are 


m(AU B) =e4 =e 
m2(AU B) =e” =e 


Wo(—s(AUB)) 


(11) 
W_1(—s(AUB)) (12) 

Of course, at least one of these solutions is necessarily 
correct but we do not know which one. So, at this current 


>Lambert’s W-function is implemented in Matlab™ as /ambertw function. 
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stage, we must consider® and the two solutions m1(A U B) 
and m (A U B) for m(A U B) as acceptable, and we must 
continue to solve equations (7) and (8) to determine the mass 
values m(A) and m(B). 


Let’s now determine m(A) at first by solving (7). Suppose 
we set the value of m(AU B) is known and taken either as 
m,(A U B), or as m2(A U B), then we can rearrange the 
equation (7) as 

s(A) — m(AU B) 


m(AU B) 


T—m(Au By 


which can be rewritten as the general equation of the form 


een ee) (13) 
by taking 
y = log(m(A)), (14) 
se Soe (15) 
= ona (16) 
The solution of (13) are given by [5] 
y = W(be*) — a. (17) 


Once y is calculated by formula (17) and since y = log(m(A)) 
we obtain the solution for m(A) given by 


m(A) = e¥ = eW be") -a, (18) 
Similarly, the solution for m(B) will be given by 

m(B) = e¥ = eWlbe*)-a (19) 
by solving the equation (y + a)e¥ = b with 

y = log(m(B)), (20) 

= oe (21) 

b= eee (22) 


We must however check if there is one solution only m(A) 
eWolbe")—@, or in fact two solutions m (A) = eWo(be*) 
and m2(A) = eW-1(be")—@, and similarly for the solution for 
m(B). This depends on the parameters a and b with respect 
to [—1/e,0) interval and [0, 00). 

We illustrate in the next subsection how to calculate the 
SEIP solution from these analytical formulas for the previous 
exemple. 


=a 


®Tf the two masses values are admissible, that is if m1(A U B) € [0,1] 
and if m2(AUB) € (0, 1]. If one of them is non-admissible it is eliminated. 


C. SEIP solution of the previous example 


We recall that we have for this example s(@) = 0, 
s(A) = 0.3773, s(B) & 0.4290 and s(O) & 0.3219. 
If we apply formulas (11)-(12) for this example, we have 
a = —s(AU B) = —0.3219 and therefore 


y1 = Wo(—0.3219) = —0.5681, 
yo = W_1(—0.3219) = —1.6094. 


Hence the two potential solutions for the mass m(AU B) are 


m,(AU B) =e = 0.5666, 
m2(AU B) = e” = 0.2000. 


It can be easily verified that 


—m,(AU B) log(m,(AU B)) = 0.3219 = s(AUB), 
— m3(AU B) log(mg(A U B)) = 0.3219 = (AUB). 


We see that the second potential solution m2(A U B) = 
0.2000 is the solution that corresponds to the original mass of 
AUB of the BBA m(A U B) of our example. 


Now, we examine what would be the values of m(A) and 
m(B) given respectively by (18) and (19) by taking either 
m(AU B) =m,(AU B) = 0.5666 or m(AU B) = m2(AU 
B) = 0.20. 

e Let’s examine the Ist possibility with the potential solu- 

tion 
m(AU B) = m1(AU B) = 0.5666. 


For determining m(A), we have to solve (y+ a)e¥ = b 
with the unknown y = log(m(A)) and with 


m(AU B) 0.5666 
= 82 —___ = 1.3073 
“T= m(AU B) ~ 10.5666 
s(A)—m(AUB) _ _ 0.3773 — 0.5666 
1—m(AUB) 1 — 0.5666 
Hence, be* = 0.4368 - e: 3973 = 1.6148. 
Applying formula (18), one gets’ 


b=- = 0.4369. 


my,(A) = eWolbe")—2 — 9.5769, 
ma(A) = eW-1(6e")—2 — _ 0.0216 + 0.09245. 


For determining m(B) we have to solve (y + a)e¥ = b 
with the unknown y = log(m(B)) and with 


m(AU B) 0.5666 
= 8 = 1.3073 
“~~ T—=m(AUB) ~ 1—0.5666 
s(B)—m(AUB) _ 0.4290 — 0.5666 
1—m(AUB) ~ 1 — 0.5666 
Hence, be* = 0.3176 - e1-3973 = 1.1739. 
Applying formula (19), one gets 


b=- = 0.3176. 


m(B) = eWolbe")—# — 0.5065, 
mo(B) = eW-1e")—4 — _ 9.0204 + 0.06571. 


TUsing lambertw Matlab™ function. 
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One sees that there is no effective choice for the values 
of m(A) and m(B) if we suppose m(AU B) = m (AU 
B) = 0.5666 because if one takes as real values solutions 
m(A) = m(A) = 0.5769 and m(B) = m(B) = 
0.5065 one would get 


m(A) + m(B) + m(AU B) = 0.5769 + 0.5065 + 0.5666 


= 1.65, 


which is obviously greater than one. This generates an 
improper BBA. 


e Let’s consider the 2nd possibility with the potential 
solution 


m(AU B) = m2(AU B) = 0.20. 


For determining m(A), we have to solve (y + a)e” = b 
with the unknown y = log(m/(A)) and with 


_  m(AUB) 0.20 

~ 1—m(AUB) 1-0.20 

s(A)—m(AUB) 0.3773 — 0.20 
1—m(AUB) ~~ 1—0.20 


= 0.25, 
—_—— 


Hence, be“ = —0.2216- 97° = —0.2845. 


eve 2 0;5000, 
eWamher a: 01 L168, 


3 5 
SS 
ol 


For determining m(B) we have to solve (y + a)e¥ = b 
with the unknown y = log(m(B)) and with 


—_ m(AU B) ne 0.200 _ 0.25, 
1—m(AUB) 1-—0.20 
b= _ 5(B) —m(AU B) 7 _ 0.4290 — 0.20 _ 
1—m(AU B) 1 — 0.20 
Hence, be“ = —0.2862 - e925 = —0.3675. 
Applying formula (19), one gets 


m(B) = eWolbe")—2 — 9.3000, 
m2(B) = eW-1(6e")—4 — 027302. 


Based on this 2nd possibility for potential solution 
m(A U B) = 0.20, one sees that the only possible 
effective choice of mass values m(A) and m(B) is to take 
m(A) = m4(A) = 0.50 and m(B) = mi(B) = 0.30 
which gives the proper sought BBA such that m(A) + 
m(B)+m(AUB) = 1 which exactly corresponds to the 
orignal BBA that has been used to generate the entropiece 
vector s(m) for this example. 


In summary, for the case |O| = 2 it is always possible to 
calculate the BBA m(-) from the knowledge of the entropiece 
vector, and the solution of SEIP is obtained by analytical 
formulas. 


= —0.2216. 


= —0.2862. 


D. Remark 


In the very particular case where s(AU B) = 0 the equation 
(9) reduces to 


—m(AU B)log(m(AU B)) = 0, (23) 
which has two possible solutions m(AUB) = m (AUB) = 1, 
and m(AU B) = m2(AU B) =0. 


If m(A U B) = 1, then it means that necessarily the 
BBA is the vacuous BBA, and so m(A) = m(B) = 0, 
u(A) = PI(A) — Bel(A) = 1, u(B) = PI(B) — Bel(B) = 1. 
Therefore® 


s(A) = —m(A)(1 — u(A)) log(m(A)) + u(A)(1 — m(A)) 
= —m(A)(1 — m(AU B)) log(m(A)) 
+m(AU B)(1— m(A)) 
yl 


\(1 log(m(B)) + u(B)(1 — m(B)) 
= —m(B)(1 — m(AU B)) log(m(B)) 
(AUB)(1 


So the choice of m(A U B) = m,(AU B) = 1 is the only 
possible if the entropiece vector is s(m) = [110]*. 


If s(A) <1, or if s(B) < 1 (or both) then we must choose 
m(AU B) = m2(A U B) = 0, and in this case we have to 
solve the equations 


s(A) = 


The possible solutions of 
—m/(A) log(m(A)) are given by 


equation 


m4(A) = aWol-s(4)). 


ma(A) = 1-4), 


and the possible solutions of s(B) = 


—m/(B) log(m(B)) are given by 


equation 


my(B) = eet), 


m2(B) = eW-1- (4), 


(26) 
(27) 
8We use the formal notation log(0) even if log(0) is —oo because in our 


derivations we have always a 0 log(0) product which is equal to zero due to 
L’H6pital’s rule [4]. 
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In this particular case where s(A U B) = 0, and s(A) < 1 
or s(B) < 1, we have to select the pair of possible solutions 
among the four possible choices 


(m(A), m( ( 

(m(A), m(B)) = (me 

(m(A), m( ( 
The judicious choice of pair (m(A),m(B)) must satisfy the 


) 
proper BBA constraint m(A)+m(B)+m(AUB) = 1, where 
m(AU B) = 0 because s(A U B) = 0 in this particular case. 


For instance, if we consider 0 = { A, B} and the following 
(Bayesian) BBA 


m(A) = 0.6,m(B) = 0.4,m(AU B) = 0. 


The entropiece vector s(m) is 


s(A) 0.3065 
s(m)=| s(B) | ~ |0.3665 (28) 
s(AU B) 0 


Hence from s(m) we can deduce m(AU B) = 0 because we 
cannot consider m(A U B) = 1 as a valid solution because 
s(A) < 1 and s(B) < 1. The possible solutions of equation 
s(A) = —m(A) log(m(A)) are 

ig Ad gO E FO) ms eal O08) = 0,600, 


eg A) Se ce 8007) 0.1770, 


and the possible’ solutions of the equation 
s(B) = —m(B) log(m(B)) are 
ni By See) = ge O eee) = O-A000, 


m2(B) = eW-1(-(B)) — eW-1(—0.3685) _ 9.3367, 


One sees that the only effective (or judicious) choice for 
m(A) and m(B) is to take m(A) = m4(A) = 0.60 and 
m(B) = m,(B) = 0.40, which coincides with the original 
Bayesian BBA that has been used to generate the entropiece 
vector s(m) = [0.3065, 0.3665, 0]. 


IV. CONCLUSION 


In this paper we have introduced for the first time the en- 
tropiece inversion problem (EIP) which consists in calculating 
a basic belief assignment from the knowledge of a given 
entropiece vector which quantifies effectively the measure 
of uncertainty of a BBA in the framework of the theory 
of belief functions. The general analytical solution of this 
mathematical problem is a very challenging open problem 
because it involves transcendental equations. We have shown 
however how it is possible to obtain an analytical solution for 
the simplest EIP involving only two elements in the frame of 
discernment. Even in this simplest case the analytical solution 
of EIP is not easy to obtain because it requires a calculation 
of values of the transcendental Lambert’s functions. Even if 
no general analytical formulas are found for the solution of 
general EIP, it would be interesting to develop numerical 
methods to approximate the general EIP solution, and to 
exploit it in future advanced information fusion systems. 


816 


(1] 


[2] 


[9] 


|] R.E. Bradley, S.J. Petrilli, C.E. 


REFERENCES 


G. Shafer, A mathematical theory of evidence, Princeton University 
Press, 1976. 

J. Dezert, An Effective Measure of Uncertainty of Basic Belief As- 
signments, in Proc. of Fusion 2022 Int. Conf., ISIF Editor, Linkdping, 
Sweden, July 2022. 


] J. Dezert, A. Tchamova, On Effectiveness of Measures of Uncertainty 


of Basic Belief Assignments, Information & Security Journal: An Inter- 
national Journal (ISIJ), Vol. 52, 2022. 

Sandifer, L’Hépital’s analyse des 
infiniments petits (An annoted translation with source material by 
Johann Bernoulli), Birkhauser, 311 pages, 2015. 


] R.M. Corless, G.H. Gonnet, D.E.G. Hare, D.J. Jeffrey, D-E. Knuth, On 


the Lambert W Function, Advances in Computational Mathematics, 
Vol. 5, pp. 329-359, 1996. 


|] Wikipedia, Lambert W function https://en.wikipedia.org/wiki/Lambert_ 


W_function. Accessed Ist December 2022. 

C.E. Shannon, A mathematical theory of communication, in [9] and in 
The Bell System Technical Journal, Vol. 27, pp. 379-423 & pp. 623- 
656, July, 1948. 

J. Dezert, A. Tchamova, D. Han, Measure of Information Content of 
Basic Belief Assignments, in Proc. of Belief 2022 Int. Conf., Paris, 
France, Oct. 26-28, 2022. 

N.J.A., Sloane, A.D. Wyner, Claude Elwood Shannon 
Papers, JEEE Press, 924 pages, 1993. 


- Collected 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Measure of Information Content 
of Basic Belief Assignments 


Jean Dezert*, Albena Tchamova?, Deqiang Han° 


“The French Aerospace Lab, ONERA, F-91761 Palaiseau, France. 
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria. 
“Institute of Integrated Automation, Xi’an Jiaotong University, Xi’an, China. 
Emails: jean.dezert@onera.fr, tchamova@bas.bg, deqhan @xjtu.edu.cn 


Originally published as: J. Dezert, A. Tchamova, D. Han, Measure of Information Content of Basic Belief 
Assignments, in Proc. of Belief 2022 Int. Conf., Paris, France, Oct. 26-28, 2022, and reprinted with 


permission. 


Abstract—In this paper, we present a measure of Information 
Content (IC) of Basic Belief Assignments (BBAs), and we show 
how it can be easily calculated. This new IC measure is inter- 
preted as the dual of the effective measure of uncertainty (i.e. 
generalized entropy) of BBAs developed recently. 


Keywords: belief functions, information content, generalized 
entropy. 


I. INTRODUCTION 


Information quality (IQ) evaluation is of major importance 
for information processing and for helping the decision- 
making under uncertainty. In [1], the authors introduced the 
Accessibility, Interpretability, Relevance, and Integrity con- 
cepts as main attributes to describe the information quality in 
the context of assurance and belief networks, but unfortunately 
they present only general concepts without explicit formulas to 
evaluate quantitatively these attributes. In several recent books 
devoted to IQ [2]-[5], the authors proposed different models 
and methods of IQ evaluations. Recently in [6], Bouhamed et 
al. proposed a quantitative IQ evaluation using the possibility 
theory framework, which could be extended to the belief 
functions theory framework with further investigations. In 
this latter work, the information quantity component being 
necessary for the IQ evaluation is based on Gini’s entropy 
rather than classical Shannon entropy. From the examination of 
these aforementioned references (and some references therein), 
it is far from obvious to make a clear justified choice among 
all these methods, especially when we model the uncertain 
information by belief functions (BF). What is clear however 
is that several distinct factors (or components) must be taken 
into account in the IQ evaluation mechanism. In this paper we 
focus on one of these components which is the Information 
Content (IC) component that we consider as the very (if not the 
most) essential component for IQ evaluation and indispensable 
for developing an effective IQ evaluation method in future 
research works. 

It is worth noting that we do not address directly the 
whole IQ evaluation problem in this work but to provide a 
mathematical solution for measuring the IC of any Basic Belief 
Assignments (BBA) in the belief functions (BF) framework. 
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Our new IC measure is interpreted as the dual of an effective 
Measure of Uncertainty (MoU) developed recently [7]. We 
show how to calculate the IC of a BBA, and we also discuss 
the notion of information gain and information loss in the 
BF context. In our opinion, we cannot define a measure of 
Information Content independently of a Measure of Uncer- 
tainty (MoU) because they must be strongly related to each 
other. Actually these measures are two different sides of a 
same abstract coin we would say. On one side (the uncertainty 
side), more uncertainty content we have harder is the decision 
or choice to make, and on the other side (the information side) 
more information content we have easier and stronger is the 
decision or choice to make. This very simple and natural basic 
principle will be clarified mathematically next. So, the measure 
of information content of a BBA must reflect somehow the 
easiness and strength in the choice of an element of the frame 
of discernment drawn from the BBA (i.e. in the decision- 
making). This paper is organized as follows. After a brief 
recall of basics of belief functions in section I, we recall the 
effective MoU adopted in this work in section II. Section IV 
defines the measure of information content of a BBA and the 
information granules vector. Section V introduces the notions 
of information gain and information loss. Conclusions and 
perspectives appear in the last section. 


II. BELIEF FUNCTIONS 


The belief functions (BF) were introduced by Shafer [8] for 
modeling epistemic uncertainty, reasoning about uncertainty 
and combining distinct sources of evidence. The answer of 
the problem under concern is assumed to belong to a known 
finite discrete frame of discernement (FoD) 0 = {61,...,0n} 
where all elements (i.e. members) of © are exhaustive and mu- 
tually exclusive. The set of all subsets of O (including empty 
set (), and @) is the power-set of © denoted by 2°. The number 
of elements (i.e. the cardinality) of the power-set is 2!°l. A 
(normalized) basic belief assignment (BBA) associated with a 
given source of evidence is a mapping m°(-) : 2° — [0,1] 
such that m°(0) = 0 and Sy <0 m°(X) = 1. ABBA m9(-) 
characterizes a source of evidence related with a FoD O. For 
notation shorthand, we can omit the superscript 0 in m°(-) 
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notation if there is no ambiguity on the FoD we work with!. 
The quantity m(X) is called the mass of belief for X. The 
element X € 2° is called a focal element (FE) of m/(-) if 
m(X) > 0. The set of all focal elements of m/(-) is denoted? 
by Fo(m) = {X € 2°|m(X) > Oo}. 

The belief and the plausibility of X are defined for any 
X € 2° by [8] 


Bel(X) = m(Y), (1) 


YE2°|YCX 


PI(X) = 
YE2°|XnYHAO 


m(Y)=1-—Bel(X), (2) 


where X = © \ {X} is the complement of X in 0. 

One has always 0 < Bel(X) < PI(X) <1, see [8]. For 
X=, Bel(d)=0 and Pl(@)=0, and for X =© one 
has Bel(Q) =1 and PI(Q)=1. Bel(X) and PI(X) are 
often interpreted as the lower and upper bounds of unknown 
probability P(X) of X, that is Bel(X) < P(X) < PI(X). 
To quantify the uncertainty (ie. the imprecision) of 
P(X) € [Bel(X), PI(X)], we use the notation u(X) € [0, 1] 
defined by 

u(X) = PI(X) — Bel(X). (3) 


The quantity u(X) = 0 if Bel(X) = Pl(X) which means that 
P(X) is known precisely, and one has P(X) = Bel(X) = 
PI(X). One has u(%)=0 because Bel(0) = Pl(d) = 0, 
and one has u(@) =O because Bel(O) = PI(O) = 1. If 
all focal elements of m/(-) are singletons of 2° the 
BBA m(-) is a Bayesian BBA because VX € 2° one has 
Bel(X) = PI(X) = P(X) and u(X) = 0. Hence the belief 
and plausibility of X coincide with a probability measure 
P(X) defined on the FoD 0. The vacuous BBA characterizing 
a totally ignorant source of evidence is defined by m,(X) = 1 
for X = 0, and m,(X) =0 for all X € 2° different from 
©. This particular BBA has played a major role in the 
establishment of a new effective measure of uncertainty of 
BBA defined in [7]. 


III. GENERALIZED ENTROPY OF A BBA 


In [9] we did analyze in details forty-eight measures of 
uncertainty (MoU) of BBAs by covering 40 years of research 
works on this topic. Some of these MoUs capture only a par- 
ticular aspect of the uncertainty inherent to a BBA (typically, 
the non-specificity and the conflict). Other MoUs propose a 
total uncertainty measure to capture jointly several aspects of 
the uncertainty. Unfortunately, most of these MoUs fail to 
satisfy four very simple reasonable and essential desiderata, 
and so they cannot be considered as really effective and 
useful. Actually only six MoUs can be considered as effective 
from the mathematical sense presented next, but unfortunately 
they appear as conceptually defective and disputable, see 
discussions in [9]. That is why, a better effective measure of 
uncertainty (MoU), i.e. generalized entropy of BBAs has been 


'However, we will keep m®(-) notation when very necessary. 
22 means equal by definition. 


developed and presented in [7]. The mathematical definition 
of this new effective entropy is given by 


d= 9(X), (4) 


s(X) = —m(X)(1 — u(X)) log(m(X)) 
+u(X)(1—m(X)). (5) 


In (5), the term —(1—u(X))log(m(X)) is equal to 
(1 — u(X)) log(1/m(X)), and log(1/m(X)) is called the 
surprisal? of X. Therefore (1 — u(X))log(1/m(X)) repre- 
sents the discounted surprisal of X by the confidence factor 
(1 —u(X)) one has on the precision of the probability P(X). 
The term u(X)(1 — m(X)) entering in (5) corresponds to the 
imprecision u(X) about P(X) discounted by (1 — m(X)). 
The main idea is the greater m(X) the less one should 
take into account the imprecision u(X) in the MoU. The 
quantity s(X ) is the uncertainty contribution related to element 
X (named the entropiece of X) in the MoU U(m). This 
entropiece s(X) involves m(X) and the imprecision u(X) = 
PI(X) — Bel(X) about the unknown probability of X in a 
subtle interwoven manner. The cardinality of X is indirectly 
taken into account in the derivation of s(X) thanks to u(X) 
which requires the derivation of PI(X) and Bel(X) functions 
depending on the cardinality of X. Because u(X) € [0,1] 
and m(X) € [0,1] one has s(X) > 0, and U(m) > 0. The 
quantity U(m) is expressed in nats because we use the natural 
logarithm. U(m) can be expressed in bits by dividing the 
U(m) value in nats by log(2) = 0.69314718.... This measure 
of uncertainty U(m) is a continuous function in its basic belief 
mass arguments because it is a summation of continuous func- 
tions. In formula (5), we always take m(X) log(m(X)) = 0 
when m(X) = 0 because lim,,(x)9+ m(X) log(m(X)) = 
0 which can be proved using L’H6pital rule [11]. Note 
that for any BBA m, one has always s() = 0 because 
m(0) = 0 and u(0) = Pl(0) — Bel(O) = 0-0 =0. For the 
vacuous BBA, one has s(Q) = 0 because m,(Q) = 1 and 
u(O) = PI(O) — Bel(O) =1—-1=0. 

The set {s(X), X € 2°} of the entropieces values s(X) can 
be represented by an entropiece vector s(m°) = [s(X), X € 
2°), where any order of elements X of the power set 2° 
can be chosen. For simplicity, we suggest to use the classical 
N-bits representation (if |O| = N) with the increasing order 
- see the next example. 

This measure of uncertainty U(m) is effective because it 
can be proved (see proofs in [7]) that it satisfies the following 
four essential properties: 


1) U(m) = 0 for any BBA m/(-) focused on a singleton X 
of 2°: 
2) U(m?) < U(m? ) if [8] < |e"); 


3This terminology is not used by Shannon in his original paper but it has 


been introduced by Tribus in [10] in the probabilistic context, and by analogy 
we adopt Tribus’ terminology also for BBAs. 
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3) U(m) = —Vixee MX) log(m(X)) if the BBA m(-) 
is a Bayesian BBA. Hence, U(m) reduces to Shannon 
entropy [12] in this case; 

4) U(m) < U(m,) for any non-vacuous BBA m(-) and 
for the vacuous BBA m,(-) defined with respect to the 
same FoD. 

The proof of the three first properties is quite simple to make. 
The proof of the last property is much more difficult. As 
explained in [7], we do not consider that the sub-additivity 
property [13] of U(m) is a fundamental desideratum that 
an effective MoU must satisfy in general. In fact the sub- 
additivity desideratum is incompatible with the fourth impor- 
tant property U(m) < U(m,) above which stipulates that 
none non-vacuous BBA can be more uncertain (i.e. more 
ignorant about the problem under consideration) than the 
vacuous BBA. Actually, it does not make sense to have the 
entropy U(m®*®’) of the vacuous joint BBA m°*®" defined 
on the cartesian product space © x OQ’ smaller than (or equal 
to) the sum U(m®) + U(m®’) of entropies of vacuous BBAs 
m® and me defined respectively on © and ©’. There is 
no theoretical justification, nor intuitive reason for this sub- 
additivity desideratum in the context of non-Bayesian BBAs. 
Of course for Bayesian BBAs, U(m) is equivalent to Shannon 
entropy which is in this case sub-additive. 

It can be also proved, see [7] for details, that the entropy 

of the vacuous BBA m.,, related to a FoD O is equal to 


U(m®) = 2'°l — 2. (6) 


This maximum entropy value U(m,) makes perfect sense 
because for this very particular BBA there is no information 
at all about the conflicts between the elements of the FoD. 
Actually for all X € 2° \ {0,0} one has u(X) = 1 because 
[Bel(X), Pl(X)] = [0,1], and one has u(@) = 0 and u(@) = 
0. Hence, the sum of all imprecisions of P(X) for all X € 2° 
is exactly equal to 2'°! — 2 which corresponds to U(m®) as 
expected. Moreover, one has always U(m®) > log(|Q|) which 
means that the vacuous BBA has always an entropy greater 
than the maximum of Shannon entropy log(|O]|) obtained with 
the uniform probability mass function distributed on O. 


Example 1 of entropy calculation: consider 0 = {6,, 62} 
and the BBA m®°(6,) = 0.5, m?(@2) = 0.3 and m®(6, U 
62) = 0.2, then one has [Bel(), P1(@)| = [0, 1] and u() = 0, 
[Bel(0,), Pl(0,)] = [0.5,0.7], [Bel(@2), Pl(A2)] = [0.3, 0.5}, 
and [Bel(©), PI(O)] = [1,1]. Hence, u(,) = 0.2, u(@2) = 
0.2 and u(O) = 0. Applying (5), one gets (0) = 0, s(01) © 
0.377258, s(02) + 0.428953 and s(O) © 0.321887. Using the 
2-bits representation with increasing ordering’, we encode the 
elements of the power set as () = 00, 0; = 01, 02 = 10 and 
6, U 62 = 11. The entropiece vector for this simple example 
is 


(0) 0 
6, _ | s(:) | _ [0.377258 
je ial ME) 0.428953 @) 
s(01 U 02) 0.321887 


4Once the binary values are converted into their digit value with the most 
significant bit on the left (i.e the least significant bit on the right). 


If we use the classical N-bits (here N = 2) representa- 
tion with increasing ordering (as we recommend) the first 
component of entropiece vector s(m°) will be s(0) which 
is always equal to zero for any BBA m, hence the first 
component of s(m®) is always zero. By summing all the 
components of the entropiece vector s(m®) we obtain the 
entropy U(m®) ~ 1.128098 nats of the BBA m®(-). Note 
that the components s(X) (for X #4 Q) of the entropieces 
vector s(m®) are not independent because they are linked 
to each other through the calculation of Bel(X) and PI(X) 
values entering in u(X). 


Example 2 of entropy calculation: for the vacuous BBA m®°, 
and when using the binary increasing encoding of elements of 
2°, the first component s(() and the last component s(@) of 
entropiece vector s(m®) will always be equal to zero, and all 
other components of s(m°) will be equal to one. For instance, 
if we consider © = {6,02} and the vacuous BBA m9 (61) = 
0, m9(02) = 0 and m9(6 U 62) = 1, the corresponding 


entropiece vector s(m®) is 


(0) 0 
=) la, (8) 
8(61 U 02) 0 


By summing all the components of the entropiece vector 
s(m®) we obtain the entropy value U(m®) = 2 nats for this 
vacuous BBA m®°(-), which is of course in agreement with 
the formula (6). 


IV. INFORMATION CONTENT OF A BBA 


We consider a (non-empty) FoD of cardinality |O| = N, 
and we model our state of knowledge about the problem 
under consideration by a BBA defined on 2°. Without more 
knowledge than the FoD itself (and its cardinality N), we 
are totally ignorant about the solution of the problem we 
want to solve, and of course we have no clue for making a 
decision/choice among the elements of the FoD. The BBA 
reflecting this total ignorant situation is the vacuous BBA 
my(-), whose maximal entropy is U(m,) = 2% — 2. In such 
case, we naturally expect that the information content we have® 
is zero when the uncertainty measure is maximal. In the very 
opposite case, it is very natural to consider that the information 
content of a BBA is maximal if the entropy value (the MoU 
value) of a BBA m(-) is zero, meaning that we make a 
choice of one element of the FoD without hesitation. Based on 
these very simple ideas, we propose to define the information 
content of any BBA m/(-) as the dual of the effective measure 
of uncertainty, more precisely by 


= (2/8! 2)- S° 9(X), (9) 


X€E2° 


Saside of the value of N of course. 
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where s(X) is the entropiece of the element X € 2° given 
by (5), that is 
s(X) = —(1 — u(X))m°(X) log(m?(X)) 
+u(X)(1— m°(X)), 


and where u(X) is the level of imprecision of the probability 
P(X) given by 


u(X) = PI°(X) — Bel®(X) 
= SY mXy)- 
YE2°|XNYHAO 


m°(Y) 


S 


YE2°|YCX 


(10) 


From the definition (9), one sees that for m° x m? one has 
IC(m®) > 0 because U(m®) < U(m®), and for m° = m& 
one has IC(m®) = 0, which is what we naturally expect. 


It is worth mentioning that the information content [C(m°) 
of a BBA depends not only of the BBA m(.) itself but also 
on the cardinality of the frame of discernment® © because 
IC(m®) requires the knowledge of |©| to calculate the max 
entropy value U(m®) = 2!°! — 2 entering in (9). This remark 
is very important to understand that even if two BBAs (defined 
on different FoDs) focus entirely on a same focal element, their 
information contents are necessarily different. For instance, if 
we consider the Bayesian BBA with m®°(0,) = 1 defined on 
the FoD 0 = {6;, 02}, then 


IC(m®) = U(m®) — U(m®) = (2!©! — 2) — 0 = 2 (mats), 


U 


whereas if we consider the Bayesian BBA with m® (6,) = 1 
defined on the larger FoD 0’ = {61,602,093} (for instance), 
then 


IC(m®’) = U(m®’) — U(m®) = (2!©'| — 2) — 0 = 6 (mats). 


UV 


So even if the decision 9; that we would make based either 
on m® or on m® is the same, these decisions must not be 
considered actually with the same strength, and this is what 
reflects our information content measure. 


From this very simple definition of information content, we 
can also define the Normalized Information Content (NIC) (if 
needed later in some applications), denoted by NIC(m®) by 
normalizing [C(m®) with respect to the maximal value of 
entropy U(m®) as 


m®) —U(m® m® 
NIC(m®) & eee He an (11) 


Hence we will have NIC(m®) € [0,1] and NIC(m°) = 0 
form = my, and NIC(m®) = 1 for U(m) = 0 which is 
obtained when m(-) is entirely focused on a singleton 0; € 0, 
that is m°(0;) = 1 for some i € {1,2,...,|O]}. 

In fact, the (total) information content of a BBA I[C(m°) 
is the sum of all the information granules IG(X|m®) of 
elements X € 2°® carried by a BBA m®, that is 


IC(m®) = S* IG(X|m°) 


XE2° 


(12) 


®That is why it is better, we think, to use the notation IC(m®) instead of 
IC(m). 


where 

0,if X = 0, 
—s(X),if X =, 
1 — s(X) otherwise. 


IG(X|m®) 4 (13) 


We can define the information granules vector’ IG(m) = 
[IG(X|m®), X € 2°]” by 


IG(m®) 4 s(m®) — s(m®). (14) 


One sees that the (total) information content [C(m°) of a 
BBA m® is just the sum of all components [G(X |m®°) of the 
information granules vector IG(m). The information granules 
vector IG(m) is interesting and useful because it helps to see 
the contribution of each element X in the whole measure of 
the information content [C(m®) of a BBA m®. 


Example 1 (continued): consider 0 = {0,02} and the BBA 
m®(0,) = 0.5, m°(62) = 0.3 and m°(, U 62) = 0.2. The 
information granules vector IG(m®°) is given by 


IG(m*) = s(m>) — s(m®) 


0 0 0 

= 0.377258] _ | 0.622742 
1] |0.428953 0.571047 
0 0.321887 —0.321887 


By summing all the components of the information granules 
vector IG(m®) we obtain the (total) information content 
IC(m®°) = 0.871902 nats of the BBA m®, which can of 
course be calculated direcltly also as 


IC(m®) = U(m®) — U(m®2) = 2 — 1.128098 = 0.871902. 


However, the information granules vector IG(m°) is inter- 
esting to identify the contribution of each element X in the 
whole measure of the information content. 


V. INFORMATION GAIN AND INFORMATION LOSS 


Once the IC measure is defined for a BBA, it is rather simple 
to define the information gain and information loss of a BBA 
with respect to another one, both defined on a same FoD O. 
Suppose that we have a first BBA m9 and a second BBA 
m§, then we can calculate by formula (9) their respective 
information contents [C(m?) and IC(m®). The difference 
of information content measure of m$ with respect to m® is 
defined by® 


Aro(ma|m1) & IC(m2) — IC(m®). (15) 


If we replace [C(m$) and IC(m®) by their expressions 
according to (9), it comes 


Are(ma|m1) = [U(m?) — U(my)] — [U(me) — Um?) 
= U(m2) — U(m9). (16) 


7We suppose for convenience that the elements X € 2° are listed in 
increasing order using the classical |O|-bits representation with the least 
significant bit on the right. 

8Similarly, we can define Arc(mi|m2) + 
—Arc(m2|m1). 


IC(m?) — IC(mP) = 


820 
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If Arc(m2|m1) =0, the BBAs m? and m§ have same 


measure of information content. So, there is no gain and no 


loss in information content if one switches from m? to m& 
or vice versa. Arc(m2|m,) = 0 does not mean that the 


decisions based on m9 and on m® are the same. It does 


only means that the decision based on m? must be as easy as 
the decision made based on m&. It means that they have the 
same informational strength. That’s it. If Arc(m2|mj,) > 0, 
one has I[C(m$) > IC(m®), i.e. the BBA m9 is more 
informative than m9. In this case we get an information gain 
if one switches from m? to m9, and by duality we get an 
uncertainty reduction by switching from m? to m@. It means 
that it must be easier to make a decision based on m® rather 
on m®. If Arc(m2|m1) < 0, one has IC(m2) < IC(m®), 
ie. the BBA m9 is less informative than m9. In this case 
we get an information loss if one switches from m? to m&, 
and by duality we get an uncertainty raise by switching from 
m® to m9. It means that it must be easier to make a decision 


based on m® rather on m9. 


As simple example, consider 0 = {6), 02,03}. For the vac- 
uous BBA one has U(m°) = 23-2 = 6 nats. Suppose at time 
k = lone has the BBA m? (0,U02) = 0.2, mP(01U03) = 0.3, 
m® (61 U 62 U 63) = 0.5, then U(m?) & 5.1493 nats, and 
IC(m?) = U(m®) — U(m®) & 0.8507 nats. Suppose that 
after some information processing (belief revision, or fusion, 
etc) we come up with the BBA m8 at time k = 2 defined by 
m§$ (61) = 0.2 and m$(0,U03) = 0.8, then U(m2) = 0.5004 
nats and IC(m$) = U(m®) — U(m®) = 5.4996 nats. In this 
case, we get Arc(m2|m1) = 5.4996—0.8507 = 4.6489 which 
is positive. Hence we get an information gain by switching 
from m9 to m$ thanks to the information processing applied. 


VI. CONCLUSIONS 


In this paper we have introduced a measure of information 
content (IC) for any basic belief assignment (BBA). This 
IC measure based on an effective measure of uncertainty of 
BBAs is quite simple to calculate, and it reflects somehow the 
informational strength and easiness ability to make a decision 
based on any belief mass function. We have also shown how 
it is possible to identify the contribution of each focal element 
of the BBA to this information content measure thanks to 
the information granule vector. This new IC measure is also 
interesting because it allows to well quantify the information 
loss or gain between two BBAs, and thus as perspectives 
we could use it to quantify precisely and compare the per- 
formances of information processing using belief functions 
(like fusion rules, belief conditioning, etc). We hope that this 
new theoretical IC measure will open interesting tracks for 
forthcoming research works on reasoning about uncertainty 
with belief functions. 
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Abstract—In this erratum we correct a mathematical mistake 
included in the paper entitled: On the Validity of Dempster’s Fusion 
Rule and its Interpretation as a Generalization of Bayesian Fusion 
Rule, published in 2014 in [1]. In taking into account this 
mathematical correction the Bayesian fusion rule is associative in 
contrary to what is claimed in the original version of our paper. 
The comments in our paper remain valid for pages 223 to 238. 
Corrections in several pages from page 239 to the end of our 
paper must be done as explained next in this erratum. 


Keywords: Bayesian fusion, belief functions. 


ERRATUM 


In [1] page 239, the general formulas! #(34)-#(36) are 
incorrect. The correct formulas are presented in this erratum. 


Based on conditional statistical independence assumption 
P(4, Zq|X) = P(Z4|X)P(Z2|X), we have 
P(Z1N Zo|X)P(X) 
P(Z,N Zo) 
P(Zi|X)P(Z2|X) P(X) 
P(Z,N Za) 
POM Ay) PUtIZa)P (Za) P(X) 


pear e4 


which can be written as 


P(X|Zi1'Zo) 


» @ 
=X, Zi NM Z2) 


(X|Z1)P(X|Z2)/P(X) 
N P(X =2i|2Z1)P(X=ai[29) * 
w=1 P. A=2y) 


P(X|ZiN Zo) = (2) 


The formula (2) corresponds to formula #(24) of our original 
paper [1]. This formula (2) can be rewritten in a symmetrical 
form as follows 


if P(X|Z1) P(X|Z2) 

P(X|Z,N Zo) = Bia) ; P(X) P(X) 
ie, I ee) igs 

K'(Z1,Z2) P2(X) = -P2(X)’ 


Ror avoiding confusion with formula number in this erratum, we denote 
the formula number appearing in the original paper [1] by #(aa), where xx 
is the number under concern. 


where the normalization constant K’'(Z,, Z2) is given by: 


N P(X =2;|Z:) P(X =2;|Zo) 
K'(Z, Z) & jp Bee v 
Za) x PX=a) /PX=%) 
~y PX salt) PX=alZ2) oy 
2) PEA = 2) P2(X = 2) 


The formulas #(24)-#(33) of [1] are correct. 


If we generalize the formula (1) for s > 2 conditioning 
terms, we obtain the following expression 


P(ZLN...NZs|X)P(X) 
P(X|Z,N...N Zs) = 
vie ) P(A (11... Z,) 
= P(Z,|X)...P(Z,|X)P(X) 
P(Z1N...NZs) 
P(X|Z1)P(Z1) P(X|Z5)P(Zs) 
_ Py ad, Si A) P(X) 
we P(X =, 210...0Z4) ” 
(5) 
which can be written as 
P(X|Zi)... P(X|Zs)/PP"(X) 
P(X|ZN...NZs) = N POXS AAR .P(X=2; |Z.) ) (6) 
pare 1 ~ (X=x;) 
or equivalently as 
Tear P(X|Z:) 
P(IX|Z,0...02Z,) = SS 7 
Ga ) BCX Fine ) 
where the coefficient K(X, 7Z1,...,Z;) is defined by 
si i|Zx)) 
G0. ae peeene ee aaa © 8 bo (A = 
( 971; ’ > Ps— - — 
(8) 


The formula #(34) of [1] must be replaced by the formula 
(8) above. 
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The symmetrized form of Eq. (6) is: 


s 


P(X|Zz) 
P(X|Z,N...N Zs 0A 
( | 1 ) WEE a) pi ¢/P=-1(X) T 
_ 1 I P(X|Zx) 
nt 777, a Se tpaee 
LO Figs 39 e) Pare 53 8 (X) 
(9) 
with the normalization constant K’(Z1,...,Z,) given by: 
Ns 
P(X = 2;|Z,) 
RO Piss Bel = 
dies aaa “UX =a) 
N os 


=> — 


oe 10 
atl = 


Hence the incorrect expression #(35) of P(X|Z1N...9Zs) 
in [1] must be replaced by the formula (9) above, and the 
incorrect expression #(36) of K’(Z1,..., Zs) must be replaced 
by the formula (10). 


The agreement A,(X) of order s, the global agreement 
GA,, and the global conflict GC’, for s sources must be also 
corrected as follows: 


= xi |Z) 


Tee 


N 

re > Jes). 

ee s/ Ps— “ X=1;,) 

$f janagtg= ty Sc te 1 

P(X = 2;,|Zs) 
ae Ps-l(X = i.) 
N 
POX =a. (2 

ce5 ys =e ( vi, |Z1) 

ita VPP MX = zi) 

P(X — xi, |Zs) 


ee — GA. 

/P8-1(X = 4;,) 

The first consequence of this correction is that the property 
PI stated in [1] page 242 must be corrected as (PI): The PMF 
P(X) is a neutral element of Bayes fusion rule. Remark 2 and 
formula #(45) on page 242 must be removed. 


The remark 3 on page 242 of [1] is incorrect. Indeed, if 
we take P(X|Z;,) = P(X) for k = 1,...,s and based on the 
correct formula (9), we get actually 

Bayes(P(X), P(X), ..., P(X); P(X)) = P(X), 


and for any type of pmf P(X 
pmf). 


) Ge. uniform, and non-uniform 


The property (P3) : The Bayes fusion rule is in general not 
associative stated in [1] on page 242 is incorrect and it must 
be corrected as (P3) : The Bayes fusion rule is associative. 
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Proof of the property P3 (Associativity of Bayes rule): The 
expression of P(X|Z,M...9 Zs—1) is given by formula (9) 
when using s — 1 conditioning terms. Hence we have 


s—1 P(X|Zx) 
k=1 ~3=2 
Ps-l (Xx) 


P(X|Z,0...Zs-1) = RG Ey’ 


(11) 


with the normalization constant K’(Z1,...,Z,—1) given by 


N s— 
PX = xy Zk 
RB Gif > |] ae, 
é=1 k= P21 (X = 2) 
To calculate P(X|(Z,9...NZs-1)NZ;) from 


P(X|Z,9...NZs—1) and P(X|Z, i we use Bayes formula 
with the conditional statistical independence assumption, and 
we get 


P(X|(Z10...NZs1)N Zs) = 
P(Zi0...1 Zs—1|X)P(Zs|X) P(X) 
yn, PAs. NZX =a) PZIX =e) PX = 2) 
(13) 
Because 
P(X|Z10...0 Zs-1)P(Zi NN... Zs-1) 


P(ZiN...Zs—1|X) = P(X) 


and 
P(X|Z,)P(Zs) 


P(Z,|X) = PIX)” 


the expression of P(X|(Z1.M 
can be rewritten as 


..1Z5-1) 1 Zs) given by (13) 


P(X|(ZiN...AZo-1)N Zs) = 
P(X|Zin...0Z5-1) P(X|Zs) 
PUR) Py PX) 
N P(X=ai(fZin..0Zs-1) P(X=ail Ze 
De See see Slee Pex 


(14) 


= 2) 


After the simplification by P(X) in the numerator of (14) 
and the simplification by P(X = x;) in the denominator of 
(14) it comes 


PUFA Msn Sea) 2g) = 
P(X|Zz) 
P(X|Z1N...0 Zs-1) 5 (15) 
N P(X=2;|Z.) * 
pane P(X = ui |Z (i Viecea hid 2,1) ns) 


Replacing P(X|Z,9.. 
(11), we have 


. Z,—1) by its expression given in 


P(X|(ZiN...MZ5-1) Ze) = 
at cle Eales otaee P(X|Zs) 
K'(Zi,.., =) k= a F(x) Poe ) 
N s—1 P(X= 9) X= ae 
ile Fis Zeen Te 1 sz oe " “ See oe P(X= - A 
(16) 
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After simplification by the constant K’(Z1,...,Z,—1) one 
gets 
P(X((21N...9 Zs-1) A Zs) = 
s-1 P(X|Zx) P(X|Zs) 
k=1 PTX) P(X) 
s—1 P(X=2;|Z;,) | P(X=z2;|Zs) 
vet ies | Pani) 


PN 0) ted Tory kar PO) (17) 


N s 
Dini Peay kar P(X = 2ilZx) 


The formula (17) can be rewritten with an equivalent 
symmetrical form as 


P(X|(ZiN...NZs1) Zs) = 
fase ieee 
> 11 (Peale PES come 
a (X=a,) 
ae =2;|Z 
where K'(Z1,...,Zs) = = i i | a 
Therefore, we have proved that expression of 
P(X|(7, 1... MN Zs-1) M Zs) given by (18) is equal 


to the expression of P(X|Z,N...M Zs-1 NM Zs) given 
by (9). This proves the associativity of Bayes fusion rule, 
i.e. the validity of the property P3. Note that the equality 
P(X|((Z1N...N Zs-1) NZs) = P(X|Z1N...NZs-19 Zs) 
does not depend on a particular choice of the intersection 
of s — 1 subsets involved in the conditioning because the 
intersection operator is associative. Hence the conditioning 
terms (27 1... Zs-1) NZ, and Z1N...N Zs-1 1 Zs 
are equal. This implies that the two conditional probabilities 
must be necessary equal, which is proved by our previous 
derivations. 


With the correct formulas (9)-(10), the numerical applica- 
tion for example | on page 243 of [1] gives 


( Oa <8. 08 
PX =e) |Z, 0RAZ) == 
( ale aad) Ki23 0.22 10.22 0.22 
= 0.7273, 
1 09 O5 0.42 
P= HBAz 
( alZi 0 Za Zs) = K123 0.82 0.82 0.82 
= 0.2727. 


where the normalization constant Ky23 = K'(Z1, Z2, Z3) 
is given by (10) for s = 3, Le. 
_ 01 05 06 | 09 05 04 _ jog 
WEE WE VOe Os VOR 
This corrected result shows that Bayes fusion rule is actually 
associative because one has 
P(X|(Z19 Ze) Zs) 
P(X|Z1 9 (Ze Z3)) 
P(X|Z29 (ZN Z3)) 


= P(X|Z1N Z2N Zs), 
P(X|Z2,N Z2N Zs), 
P(X|Z19 22 Zs). 


As consequence, the property (P4) on page 245 of [1], 
although being correct, is not necessary. 


On page 250 of [1], the sentence: 

Indeed, in Bayes rule one divides each posterior source 
m,(x;) by ¥/mo(#;),i = 1,2,...,8, whereas the prior source 
mo(.) is combined in a pure conjunctive manner by DS rule 
with the bba’s m;(.), 7 = 1,2,...,8, as if mo(.) was a simple 
additional source. 
must be corrected as: 

Indeed, in Bayes rule one divides each posterior source 


mi(aj) by */ms~*(x;), i = 1,2,...,8, whereas the prior 
source mo(.) is samibined in a pure conjunctive manner by DS 
rule with the bba’s m;(.), i = 1,2,...,8, as if mo(.) was a 
simple additional source. 


This erratum concerns also some incorrect formulas appear- 
ing in a preliminary version of [1] presented in 2013 in [2]. 


REFERENCES 


[1] J. Dezert, A. Tchamova, On the validity of Dempster’s fusion rule and its 
interpretation as a generalization of Bayesian fusion rule, International 
Journal of Intelligent Systems, Special Issue: Advances in Intelligent 
Systems, Vol. 29, Issue 3, pp. 223-252, March 2014. 

J. Dezert, A. Tchamova, D. Han, J.-M. Tacnet, Why Dempster’s fusion 
rule is not a generalization of Bayes fusion rule, in Proc. of Fusion 2013 
Int. Conf. on Information Fusion (Fusion 2013), Istanbul, Turkey, July 
9-12, 2013. 


[2 


a) 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


On Inequalities Bounding Imprecision and 
Nonspecificity Measures of Uncertainty 


Jean Dezert®, Albena Tchamova 


b 


“The French Aerospace Lab, ONERA - DTIS/MIDL, 91120 Palaiseau, France. 


Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria. 
Emails: jean.dezert @onera.fr, tchamova@bas.bg 


Originally published as: J. Dezert, A. Tchamova, On Inequalities Bounding Imprecision and 
Nonspecificity Measures of Uncertainty, Information & Security Journal, Vol. 52, pp. 37-51, February 


2022, and reprinted with permission. 


Abstract—In this paper we prove two inequalities about two 
measures of uncertainty of basic belief assignments, called re- 
spectively Imprecision measure and U-uncertainty measure that 
have been introduced by Dubois and Prade in 1980’s. These 
inequalities have been considered as obvious by these authors, 
but to prove them rigorously needs some effort, as it will be 
shown. 


Keywords: belief functions, measure of uncertainty, convex 
combination, inequalities. 


I. INTRODUCTION 


This paper presents two mathematical proofs of inequalities 
about two measures of uncertainty of basic belief assignments, 
called respectively the Imprecision and the U-uncertainty (or 
nonspecificity) that have been introduced by Dubois and Prade 
in [1]—[3]. We recall that a Basic Belief Assignment (BBA) m 
defined on the power set 2° of the finite frame of discernment 
(FoD) © = {01,62,...,9n} is a mapping m(-) : 2° > [0,1] 
such that m(@) = 0 and 0 y-¢6@m(X) = 1. This type of 
mapping has been introduced by Shafer in [4]. The cardinality 
of the FoD is |O| = n. The measures of imprecision /(m), and 
of nonspecificity U(m) are respectively defined by 

Um) = $5 m(X)|X| = YS) m(X)|Xil, 


XCO X,E2° 


U(m) = D5 m(X)log(|X]) = J m(X;j) log(|Xi)), 


xco X,€2° 
(2) 


where X; is the i-th element of the power set 2° of the FoD 
O, and |X;| its cardinality. By convention, and without loss of 
generality, we will take X, = @) (the empty set), and Xgn = O. 
The integer index i varies from 1 to 2” = 2!°I, 

My is the vacuous BBA defined by m,(X) = 1 if X = 0, 
and m,(X) = 0 for all elements X 4 O of 2°. This vacuous 
BBA m, characterizes a full ignorant source of evidence. 


In the next sections we prove that for any BBA m 4 my, 
defined on 2° the two following inequalities hold 


I(m) < U(my), (3) 


and 
U(m) < U(my). (4) 


We will prove these two inequalities in two ways: 1) by 
a direct application of the Theorem of convex combination 
(see Theorem 1, and Theorem 2 in the appendix) by a direct 
calculation from the mathematical definitions of [(m) and 
U(m) measures of uncertainty. 

For proving these inequalities, we first recall that a convex 
combination, denoted by s,,, of n values {z;,2 = 1,2,...,n} 
is a linear combination of the form 


sn = a (5) 
i=1 


where w; € [0,1] is the weight of the value z; such that 
¥ 4 Wi = 1. 
In the appendix, we prove the following useful theorem that 


will help us to prove the inequalities (3) and (4) in the next 
sections. 


Theorem 1: Let s, = x4 w;z; be a convex combination 


of n values 21, ..., 2, with normalized weights w 1, ..., Wn, 
where w; € [0,1]. Then, we have 
min{z; € Z} < s, < max{z; € Z}, (6) 


where Z = {z; € {21, 22,---,2n}|wi > O}. 


Proof of Theorem 1: see appendix. 


II. PROOF THAT I(m) < I(m,) IFm 4 my 
A. First Proof : using the Theorem of convex combination 


The proof of inequality (3) is a direct application of the 
Theorem | when working with 2” = 2!°! values! z; = |X;| 
and weights w; = m(X;). We recall that w; = m(X1) = 
m(0) = 0 for any BBA m (by definition of m). Therefore 
one has always at least one weight (i.e. w 1) among all 2” 
weights equals zero, which justifies the use of Theorem 1, 
rather than Theorem 2 of appendix. 


The imprecision measure [(m) can also be expressed as 
I(m) = S<3_, m(X;)|Xj| because 


gn 
S> m(Xi)| Xi] = S5 m(X,)|Xi| 
X;,E22 i=l 
'We recall that integer index i spans {1,2,...,2”}. 
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Based on Theorem 1, we have 


min{|X;| € {|Xil, |Xol,. ag | Xan |}|m(X;) > O} Ss m(X)|X| 
= XCO 

< S > m(X)|Xil < . 
i=1 


max{|Xs] € {|Xi],|Xal,--.,|Xonl}lm(Xi) > 0}. 


The upper bound of inequality (7) is always lower than 
|O| = nif m ¥ my, and it is equal to |O| = n when m = my. 
Therefore, one has 


De 


We can express ))ycQm(X)|X| as 


m(X)-1 


XCO st|X|=1 


+ y 


XCO st|X|=2 


XCO st|X|=3 


= - ~~ m(X)-(n—1), 
[> m(X;)|Xi| < |O}, (8) XCO st|X|=n-1 
i=1 
and because I(m,) = m,(O) - |O| = 1- |O| = |O], one sees _ that is 
that the valid inequality (8) is the same as 
I(m) < (my), (9) m(IXI= YP m(X) 
Xco XCO st|X|=1 
which completes the proof of the inequality (3). 5, s m(X) 
B. Second proof : using direct calculation XCO st.|X|=2 
First we note that +3: m(X) 
(my) = m(@)- || =1-n=n. XCO st|X|=3 
Because m is a (normalized) BBA [4] such that m(0) = 0 
and }*yc@m(X) = 1, one has +(n—1)- > m(X), 
XCO st.|X|=n-1 
m(@)+ S> m(X) =1, 
XcCO which can be rewritten as 
or equivalently s 3 
m(X)|X| = m(X) 
m(Q) =1- S- m(X). xce XCO st|X|=1 
XCO 
+ So m(X)+ m(X) 
aneehe XCO st|X|=2 XCO st|X|=2 
n-m(Q) =n-[1— S> m(X)). 4 rinfI0) 4 2 m(X) 
XCO XCO st.|X|=3 XCO st] X|=3 
The expression of /(m) can be decomposed as sii ns 
I(m) = $7 m(X)|X| + s m(X) 
XCO XCO st|X|=n-1 
= m(®)-|O|+ 7) m(X)|x| +(n-2)- > m(X), 
Xco XCO st|X|=n—1 
=n-m(0)+ S> m(X)|X| 
xXco or equivalently 
=n-[L— So m(X)] + S$) m(X) |X| 
XCO XCe S> m(X)|X| = S° m(X) 
=n—[n S> m(X)- SS m(X)/XII.- ace mee 


Xce Xce 
To prove that /(m) <1(m,) is equivalent to prove that 


n—[n S> m(X)— So m(X)|X|] <n, 


XCO xXCO 
or to prove 


n S> m(X)— S5 m(X)|X| > 0. (10) 


XCO xco 
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+ (n — 2)- 


XCO st|X|=2 


p2- So m(X) 


XCO st|X|=3 


XCO st|X|=n-1 
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Then, for the left hand side of the inequality (10) we obtain 
the following expression 


n- = m(X) — Se m(X)|X| = 
XCO XCO 
(n-1)- S> m(X) 
XCO st|X|=1 
+(n—2)- S$ > m(X) 
XCO st|X|=2 
+(n—3)- So m(X) 
XCO st|X|=3 
ls 
n= (a= Tis x m(X). 


XCO st|X|=n-1 


The right hand side of the previous expression is clearly 
strictly positive, that is 


(n= 1)> m(X) 
XCO st|X|=1 
+(n—2)- S>  m(X) 
XCO s.t.[X|=2 
+(n—3)- S> m(X) 
XCO st|X|=3 
aE 
+(n—(n-1))- ye m(X) > 0, 


XCO st.|X|=n-1 


because n > 1 (the FoD has more than one hypothesis inside), 
and also because there is at least one element X C O for 
which m(X) > 0 when m 4 my. 

Then we obtain 


a 


which completes our second proof of (3) by a direct calcula- 
tion. 


|X] > 0, 


-S m(X 


xXxCO 


III. PROOF THAT U(m) < U(m,) IFm F my 


A. First Proof : using the Theorem of convex combination 


The proof of inequality U(m) < U(m,) is similar to the 
proof of /(m) < I(my,) by replacing values |X;| by log(|X;|), 
and by taking m(X) log(|Xi|) = m(@) log(|@|) = 0-log(0) = 
0 which is easily justified by continuity since x«log(x) > 0 
as x — 0. More precisely, we can express U(m) as 


U(m) = m(0)(log(|0|) + 


dm 


Xi €2°\ {0} 


X;) log(|Xil) 


gn 


ge Pc 


;) log(| XI). 


Based on Theorem 1, we have 


min{log(|X;|) € ee 


< Sm 


max{log(|X;|) € og(|Xel). i 


-,log(|X2n|)}lm(Xi) > OF 


Xi) log(|Xi|) < 


 log(|Xan|)}lrm(Xi) > OF. 


Because log(-) is a continuous increasing function, the 
upper bound of the previous inequality is always lower than 


log(|O|) = oe when m # my. Therefore, 
S mix ;) log(|Xil) < log(|®)), (11) 
and because U(m,) = m,(®) - log(|O]) = 1 - log(|O}) 
one sees that the valid inequality (11) is the same as 
U(m) < U(my), (12) 


which completes the proof of the inequality (4). 


B. Second proof : using direct calculation 
We prove the inequality U(m) < U(my) similarly to our 
second proof for I(m) < l(m,) by replacing values |X;| by 
log(|X;|). We note that 
U(my) = m(O) - log(|O]) = 1 - log(n) = log(n). 


Because m is a (normalized) BBA [4] such that m(@) = 0 
and }oyce m(X) = 1, one has 


m(Q)+ S$) m(X) =1, 


XCO 


or equivalently 


m(®) =1— S° m(X) 
XCO 
Therefore 
log(n) - m(®) = log(n) - [L— S> m(X)]. 


XCO 
The expression of U(m 
do m(X) log(|X1) 


xco 


) can be decomposed as 


= m(®) -log(n) + $7 m(X) log(|X1) 
xXxCO 
= log(n) -[1— S7 m(X)] + SF m(X) log(|X1) 
xXCO XCO 
= log(n) — [log(n) $7 m(X) — $7 m(X)log(|X))). 
XCO xXCO 


To prove that U(m 
log(n)—[log(n) S) m(X)— D7 m(X 


) < U(my) is equivalent to prove that 


)log(|X])] < log(n), 


XCO XCO 
or to prove 
log(n) S) m(X)— S5 m(X)log(|X|) >0. (13) 
XCO XCO 


= log(|9}), 
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We can express })y¢g@ m(X) log(|X|) as 


2 


XCO st|X|=1 
> 
XCO sit|X|=2 
+ * m(X) - log(3) 
XCO st|X|=3 
+ 
+ S- m(X) - log(n — 1). 
XCO sit|X|=n-1 


2 m(X) log(|X]) = 


XCO 


m(X) - log(1) 


m(X) - log(2) 


Then for the left hand side of inequality (13) we obtain: 


log(n) $7) m(X) — $5 m(X) log(|X| = 


xCO xCO 


[log(n) —log(1)}- SD 


XCO st|X|=1 


DD 


XCO st|X|=2 


» 


XCO st|X|=3 


m(X) 


+log(n) — log(2)] - m(X) 


+[log(n) — log(3)] - m(X) 


+... 
+[log(n) — log(n — 1)]- m(X). 


XCO st|X|=n-1 


Because log(1) = 0, the equation above can be rewritten as 


log(n) S> m(X) — S> m(X)log(|X| = 


xCco xco 


log(n) - S- 


XCO st|X|=1 


= 


XCO st|X|=2 


»D 


XCO st|X|=3 


a 


XCO st|X|=n-1 


m(X) 
+[log(n) — log(2)]- 


+[log(n) — log(3)] - 


fas 


+[log(n) — log(n — 1)] - m(X). 


Because n> 1, and because log(.) is an increasing 
function one always has for n>1, log(n) >0, and 
[log(n) — log(n — 1)] > 0. Because there is at least one el- 
ement X C O for which m(X) > 0 when m #4 my, we can 
conclude that 


log(n) S> m(X) — S > m(X) log(|X| > 0. 


XCO xCO 


which completes our second proof of (4) by a direct calcula- 
tion. 


IV. CONCLUSION 


In this paper we have proved that the imprecision measure 
I(m) is always lower than I(m.,) = |O|, and its U-uncertainty 
(also known as non-specificity) measure U(m) is always lower 
than U(m,) = log(|@|) for any non vacuous BBA m. The 
proofs presented in this paper have been obtained by two 
different ways: by the theorem of convex combination, and by 
direct calculation from the mathematical definitions for I(m), 
I(m,), U(m), and U(m,). We have shown that the use of 
the theorem of convex combination provides an elegant and 
shorter proof of the inequalities. This theorem will be helpful 
to evaluate the lower and upper bounds of any measures of 
uncertainty of a BBA that would be based on any convex 
combination of mass values (chosen as weighting factors) and 
real values committed to each element of the power set of the 
frame of discernment. 


APPENDIX 


Before proving Theorem |, we need to establish at first the 
following theorem. 


Theorem 2: Let s,, = )>}_, wiz; be a convex combination of 
mn values 21, ..., 2, With strictly positive normalized weights 


W1, ..+, Wn. Then, we have 
min{z1,...,2n} <8, < max{z1,...,2n}. (14) 
The proof of Theorem 2 is done by induction. 
Proof of the theorem 2: 
e For n = 1, one has only one value z, with weight 
w, = 1. Hence sy = wiz, = 21, min{z,} = 2, and 


max{z,} =z. Therefore, min{z,} = s; = max{z,}, 
which is a special case of the inequality (14). Therefore 
the inequality (14) is valid for n = 1. 
e For n = 2, one has two values {21, 22} with (strictly) 
positive weights {w1, wo} and sg = w121 + W222. 
then sg = wiz + wWoz2 = 
(wi + we) 21 = 24 = 2. 
min{z1, 22} = 21 = 22 and 
max{z1,22} = 21 = 22. Therefore, one _ gets 
min{z1, z2} = 8s. = max{z1,22}, which is a 
special case of the inequality (14) for n = 2. 
2) If z; 4 zg, then two sub-cases are possible: 


1) if 21 => 22, 
W121 + W2z%1 = 
Hence one _ has 


a) Case 1: if 21 < zo, then min{z, z2} = 21 and 
max{ z1, 22} = z2. We have 


82 = W121 + wWez2 = (21 — 21) + wiz1 + we22 
= 2, —(1l—wi)z1 + wez2 
= 21 — W221 + W2Z2 
= 21 + wo(zg — 21) > min{z1, zo}. 
This last inequality comes from the fact 
that wo > O, and z — z > O be- 


cause min{z1,22} = 21. So we have proved 
min{ z1, 22} < so. 
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Because wz € [0,1], we have w2(z2—21) < z2—- 
z,, and therefore 


82 = 21 + We(z2 — 21) 


<2 + (29 — 21) = 2 = max{2, z2}. 


This shows that so < max{z1, 22}. Therefore, 
we have proved 


min{ 21, z2} < s9 < max{z1, 29}. 


We see that the inequality (14) holds for n = 2 
for the case 1. 

b) Case 2: if zo < 2, then min{z, 22} = z2 and 
max{z1, 22} = 21. We have 


82 = W121 + We22 = (22 — 22) Fwiz1 + W222 
= 22 — (1 = We) 22 + W121 
= 22 — W122 + WIZ, 


= 2. + wi(z1 — 22) > min{ 21, 22}. 


This last inequality comes from the fact 
that wy > O, and z — z2 > O be- 
cause min{z1, 22} = z2. So we have proved 
min{ 21, 22} < so. 


Because w, € [0,1], we have w1(21—2z2) < z1- 
Zz, and therefore 


$2 = 22 + w1(z1 — 22) 


< zt 2-22 = 21 = max{21, 29}. 


This shows that so < max{z1, 22}. Therefore, 
we have proved 


min{ 21, z2} < s9 < max{z1, 29}. 


We see that the inequality (14) holds for n = 2 
for the case 2. 

Finally, the inequality (14) is always valid for 
n = 2 in all cases, 1.e. when z1 = Z2, or z1 < 22, 
or 22 < 24. 


e For n > 2, we suppose that the inequality (14) holds. 


That is 
min{z1,...,2n} <8, < max{z1,..., Zn}. (15) 


We prove next by induction that this inequality also holds 
forn+ 1. 


For the induction with n + 1, we have to consider n + 1 


values {21,...,2n, 2n41} and n+ 1 strictly positive nor- 
malized weights {w1,...,Wn,Wn+1}, that is w; > 0 for 
i =1,2,...,n and 774) w; = 1. Because all w; > 0, 
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one has necessarily wy,+1 < 1. So, we can always express 
Sn+1 as 


n+1 


Sn41 = s Wi 24 
i=1 


= Wn412n4+1 7 s Wi 24 
i=1 
n 
i Wnt 
= Wn412Zn41 7 S ee 27) 
. he Wnt+1 
i=1 
n w 
i 
= Wn4+12n4+1 7 (1 _ Wn+1) S 7... 4 
= dle Wn+1 
w=1 
= Wn4+12n4+1 7 (1 — ewe yee 
where 
n w n 
u 
$= i = ) Ui 25 (16) 
—w 
i=1 ert =i 
. . . A : 
The new weights involved in s,, defined by vj; = -—7 a 
ie 


are also all strictly positive because w; > O and 1 — 
Wn+1 > 0, and they are also normalized because 


w=1 w=1 1— Wn+1 
{ n 
=> W; 
1 
er ( Wn+1) 


1 te ota 
because )7”"*' w; = 1, which implies 0%, wi = 1 — 
Wn+1- 


Hence, we observe that s,, = ae U;,2; 18 also a convex 
combination of the n real values {z1,...,2,} with 
normalized and strictly positive weights v,;, and therefore 


the inequality (14) holds (by assumption). 


One sees that the problem of combination of n+ 1 values 
has been reformulated as a convex combination of two 
values 2,41 and s, = 3 v;2; With strictly positive 
and normalized weights w,,+1 and (1 — wy41). Because 
the inequality (14) is satisfied for the convex combination 
of two real values (for n = 2), the following inequality 
holds 


min{ sy, Zn+4i} < 8n41 < max{Sn, Zn41}. (17) 


Because min{21,...,2n}< 8, < max{z,...,Zn} is 
assumed to be true, the inequality (17) can be rewritten 
as 


min{min{ 2, 22,..-,2n}, 2n41} 
< Snt+1 < 
max{max{2z1, 22,..-,2n},2n+1}, (18) 
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or quivalently 


min{ z, 22,---;2n;2n+41} 
< Sn+1 < 


max{ 21, 22,---;2n;2n+1}- (19) 


Therefore the inequality (14) is also valid for n+-1, which 
completes the proof of Theorem 2. 


We can generalize the Theorem 2 to take into 
account all the cases where some weights are zero. 
For this, the set on mn real values denoted by 
Z={21,22,...,2n} can always be expressed as 
Z=ZUZ, where Z5 {z € {z1,22,...,2n}|wi > 0} 
and Z5{z; € {21,22,..-,;2n}|wi =O}. The convex 

. . nm 
combination s, = 3 w;2;, can be expressed as 


n= s Wi 25+ .s Wy Zj. (20) 


ZEZ “eZ 


because w; = 0 for any z; € Z, one has ee wiz, = 0, 
and therefore s, = Coe z Wizi, whose bounds are given by 
Theorem 2. Hence, min{z; € Z} < s, < max{z; € Z}. This 
completes the proof of Theorem 1, which is more general than 
Theorem 2. 


REFERENCES 


[1] D. Dubois, H. Prade, Representation and combination of uncertainty 
with belief functions and possibility measures, Comput. Intell., Vol. 4, 
pp. 244-264, 1988. 

[2] D. Dubois, H. Prade, A note on measures of specificity for fuzzy sets, 
(E)BUSENFAL, Vol. 19, paper # 8, May, 1984. 

[3] D. Dubois, H. Prade, A note on measures of specificity for fuzzy sets, 
Int. J. Gen. Syst., Vol. 10, No. 4, pp. 279-283, 1985. 

[4] G. Shafer, A mathematical theory of evidence, Princeton University 
Press, 1976. 


832 


Advances and Applications of DSmT for Information Fusion. Collected Works. Volume 5 


Novel Moderate Transformation of Fuzzy 
Membership Function into Basic Belief Assignment 


Xiaojing Fan“, Deqiang Han*, Jean Dezert’, Yi Yang°, 


“School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China. 
oThe French Aerospace Lab, ONERA, 91120 Palaiseau, France. 
“SKLSVMS, School of Aerospace, Xi’an Jiaotong University, Xi’an 710049, China. 


Emails: jjingxiulian @ 126.com, deqhan @xjtu.edu.cn, jean.dezert @onera.fr, jiafetyy @mail.xjtu.edu.cn 


Originally published as: X. Fan, D. Han, J. Dezert, Y. Yang, Novel Moderate Transformation of Fuzzy 
Membership Function into Basic Belief Assignment, Chinese Journal of Aeronautics, 2022, and reprinted 


with permission. 


Abstract—In information fusion, the uncertain information 
from different sources might be modeled with different theoreti- 
cal frameworks. When one needs to fuse the uncertain informa- 
tion represented by different uncertainty theories, constructing 
the transformation between different frameworks is crucial. 
Various transformations of a Fuzzy Membership Function (FMF) 
into a Basic Belief Assignment (BBA) have been proposed, where 
the transformations based on uncertainty maximization and 
minimization can determine the BBA without preselecting the 
focal elements. However, these two transformations that based 
on uncertainty optimization emphasize the extreme cases of 
uncertainty. To avoid extreme attitudinal bias, a trade-off or 
moderate BBA with the uncertainty degree between the minimal 
and maximal ones is more preferred. In this paper, two moderate 
transformations of an FMF into a trade-off BBA are proposed. 
One is the weighted average based transformation and the 
other is the optimization-based transformation with weighting 
mechanism, where the weighting factor can be user-specified 
or determined with some prior information. The rationality 
and effectiveness of our transformations are verified through 
numerical examples and classification examples. 

Keywords: Belief functions, basic belief assignment, fuzzy 
membership function, information fusion, moderate transfor- 


mation. 


I. INTRODUCTION 


In multi-source information fusion, the information ob- 
tained from different sources usually have different types of 
uncertainty. Various kinds of uncertainty theories have been 
proposed including probability theory, fuzzy set theory [1], 
possibility theory [2], rough set theory [3] and theory of 
belief functions [4], etc., for dealing with different types of 
uncertain information. According to the type of uncertainty, 
the uncertain information from different sources might be 
modeled with different theoretical frameworks. Usually, these 
uncertain information with different representations cannot 
be directly combined or fused. Therefore, transformations 
between different frameworks are needed [5], and then, one 
can fuse them under the same framework. 

Random set theory is regarded as a unified framework 
for various frameworks of uncertainty including probability 
theory, fuzzy set theory, theory of belief functions, etc [5]. 
In particular, to fuse the opinion of an expert represented 
by a Fuzzy Membership Function (FMF) and the output of 
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a sensor expressed by Basic Belief Assignment (BBA), one 
can transform the FMF into a BBA and then combine two 
BBAs. One can also transform the BBA into an FMF and 
then combine two FMFs. In this paper, we focus on the 
transformation of an FMF into a BBA. Many transformations 
have been proposed [5]-[10], which can be categorized into 
two types. 

One type of transformation has to preselect the focal ele- 
ments. For example, Bi et al. [6] transform an FMF into a BBA 
by normalizing the given FMF, where the focal elements are 
preselected as singletons. As a result, the obtained BBA has 
no compound focal elements. In [5], Florea et al. transform an 
FMF into a BBA with a-cut approach, where focal elements 
are preselected to the “nested in order”. However, a prior 
selection of focal elements might lead to information loss. 
The other type of transformation obtains a BBA by solving 
constrained optimization problems. For example, our previous 
work [7] proposed two transformations based on uncertainty 
optimization, which avoid the subjective preselection of focal 
elements. The difference between the two transformations is 
the specific optimization criterion, which is the maximization 
and minimization, respectively. It has been shown in [7] that 
both transformations based on uncertainty optimization are 
rational and effective. However, these transformations based 
on uncertainty optimization seem to be “one-sided” in terms of 
the uncertainty degrees, since they only focus on the minimal 
or maximal uncertainty. Either the minimal uncertainty or the 
maximal uncertainty is an extreme case of uncertainty. If one 
only pays attention to one of the extreme cases of uncertainty 
in the process of solving the optimization problem, it would 
bring the bias of extreme attitudinal on the uncertainty degree, 
which might bring counter-intuitive results. If we jointly 
consider two extreme cases of uncertainty, we can obtain the 
BBA with the degree of uncertainty between the minimal 
and maximal ones. Such a BBA is more “balanced” and 
“moderate” than the BBA obtained by pursuing the maximal 
or minimal uncertainty. In other words, joint consideration of 
two extremes of uncertainty corresponds to a better moderate 
attitude (corresponding to a preferred consensual agreement of 
behavior) for a transformation of FMF into a BBA, and then 
we can avoid the bias of extreme attitudinal on the uncertainty 
degree. 
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In this paper, we aim to obtain such a trade-off BBA to avoid 
“one-sidedness” on the uncertainty degree, which is based on 
the two BBAs obtained by the two transformations proposed 
in our previous work [7]. To transform an FMF into such a 
trade-off or moderate BBA, a weighting factor is used, which 
can make the trade-off BBA closer to the BBA obtained with 
the uncertainty maximization or closer to the BBA obtained 
with the uncertainty minimization. The weighting factor could 
be determined by using prior information or user-specified, 
which reflect the objective situation or meet subjective pref- 
erences of users. We propose two “moderate” (i.e., balanced) 
transformations in this paper. One transformation assigns the 
weighting factors to the two BBAs obtained by optimization- 
based transformations. Then, the weighted average of these 
two BBAs is the trade-off BBA. The other transformation is 
based on a constrained minimization problem with weighting 
mechanism. The objective function is constructed by the 
weighting factor and two degrees of dissimilarity between the 
trade-off BBA (to determine) and the two BBAs obtained 
by solving the uncertainty minimization and maximization. 
According to the given FMF and the legitimate conditions 
of a BBA, the constraints are constructed. Compared with 
the transformations based on uncertainty optimization, our 
transformations can avoid the extreme attitudinal bias and 
allow users to choose the degree of a trade-off BBA according 
to their preference. 

This paper is an extended version of our previous pre- 
liminary work published in [11]. Based on the preliminary 
work, the main extended work and added value in this paper 
is as follows. The limitations of available transformations of 
an FMF into a BBA are analyzed. Examples are given to 
illustrate the loss of information that might be caused by 
the preselection of focal elements and the counter-intuitive 
results that might be caused by the extreme attitudinal of 
uncertainty. We use another more rational uncertainty measure, 
which is designed without switching frameworks, to construct 
the objective function in the transformations proposed in [7]. 
This is because the uncertainty measure Ambiguity Measure 
(AM) used in [7] as the objective function has some disputes 
and limitations, which are mentioned in [7]. Furthermore, to 
compare our transformations with the others, some numerical 
examples and a classification example are provided. Compared 
with the average classification accuracy of available transfor- 
mations, the moderate transformation has a better classification 
performance. 

The paper is organized as follows. After a brief introduction 
of the basics for theory of belief functions and some basic 
concepts of the fuzzy set theory in Section II, some traditional 
transformations of an FMF into a BBA are reviewed and 
their limitations are provided in Section III. In Section IV, the 
transformations of an FMF into a trade-off BBA are presented. 
In Section V, our transformations are compared with several 
traditional approaches and related examples are provided. As 
shown, these new transformations can bring better perfor- 
mances than other transformations in a classification example. 
Section VI concludes this paper. 


II. PRELIMINARY 
A. Basics for theory of belief functions 


The theory of belief functions, also called Dempster-Shafer 
theory [12], has been applied to information fusion [13]- 
[15], decision making [16]-[19], pattern recognition [20]— 
[22], etc. It is a powerful framework for uncertainty mod- 
eling and reasoning. The Frame of Discernment (FOD) O = 
{61,62,...,@n} is mutually exclusive and exhaustive under 
the closed-world assumption. Based on the power set of O 
(2°), a basic belief assignment (BBA, also called a mass 
function) is defined as 


S> m(A)=1, m() =0, (1) 
ACO 
where A C 0 is a proposition in the FOD. If Vm(A) > 0, A 
is called a focal element of m. 
For all A C © the belief function Bel and plausibility 
function Pl are defined as. 


Bel(A) = S~ m(B), (2) 


BCA 


S> m(B). (3) 
ANB#AO 

Bel(A) and PI(A) are the lower and upper bound of the 
probability of a focal element A, respectively. The belief 
interval of A is represented by [Bel(A), PI(A)] whose length 
is used to describe the imprecision of A’s probability. 

Let’s consider two independent BBAs: ™m , and mz on the 
same FOD. Dempster’s rule [4] is defined as 


0 A=0 
™MDem A)= ; m m (4) 

p( ) | Bancagms a. AZO, 
where K = >) pace ™1(B)m2(C) denotes the conflict co- 
efficient between two BBAs. Dempster’s rule applies only if 
Kk #1. When a high conflict between BBAs exists, Demp- 
ster’s rule might bring counter-intuitive results [23]-[25]. 

Alternative rules [26]-[30] have been proposed. 


B. Uncertainty measure of BBA and distance of evidence 


The uncertainty measure is used for evaluating the degree 
of uncertainty in a BBA. There are two types of uncertainty 
for a BBA in belief functions including the discord and non- 
specificity, which are collectively known as the ambiguity. 
Various kinds of uncertainty measures in the theory of belief 
functions have been proposed [31]-[38]. 

Let consider the FOD 0 = {6;,62,...,@,}. One of the 
total uncertainty measures is the AM [34] defined by 


AM(m) = 5° BetP() logy BetP(9), (5) 
060 
where BetP(@) is the pignistic probability of a BBA [39] 
pew(a) = > MA). (6) 
ace. 4 
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|A| denotes the cardinality of a focal element A. Actually, in 
AM, the Shannon entropy of the pignistic probability is used 
to quantify the uncertainty. The range of AM is [0, log,(|@])]. 

The distance-based total uncertainty measure (TU) is based 
on the Wasserstein distance [40] between belief intervals, 
which is defined as follows: 


dw/([a1, bi], [a2, b2]) = 


bz — ag 


ad a, ) 


(= + by 
2 2 


4h? 17h) = 
pe >) +=(= ay 


3 


To be more specific, in TU', [a1,b:] is replaced by 
[Bel({0;}), Pl({@;})] and [az, bg] is replaced by [0,1]. Then, 
TU! is defined as [36] 


TU'(m) = 1— _ > dw(Bel({6:}), PULA})], (0, 1]). (8) 


where \/3 is for normalization. The belief interval [0, 1] is the 
most uncertain case for a singleton. 

Compared with AM, TU! is a total uncertainty measure that 
is defined directly in the theory of belief functions without 
switching frameworks from the probability theory to the theory 
of belief functions. The range of TU! is [0, 1], where TU! = 0 
represents the crispest case, i.e., m({0;}) = 1, while TU! = 1 
represents the most uncertain case, i.e., m(O) = 1. 

The distance of evidence is a metric for the degree of 
dissimilarity between two BBAs, which can describe how “far” 
it is from one BBA to the other. The distance of evidence is 
crucial for many belief functions related applications. Jous- 
selme’s distance between two BBAs m, and mp defined on 
the same FOD is defined as [41] 


dj(m, mz) _ =(m =i mz2)'D(m, = m2), (9) 


2 
where the elements D(A, B) of Jaccard’s matrix D are defined 
as D(A, B) = |AN B\/|AU B| for A C O and B CO. 1/2 is 
the normalization factor. d; is a strict distance satisfying non- 
negativity, non-degeneracy, symmetry and triangular inequality 
[41]. There are also various types of distance measures that 
have been proposed [42]-[47]. 


C. Fuzzy set theory 


Fuzzy set theory [1] can be used to model the information 
without a crisp definition or a strict limit (e.g., “the target is 
fast”, “the target turns quickly”). A fuzzy set Ay is defined 
on a universe of discourse ©, which is equivalent to the FOD 
in belief functions. Ay is represented by a Fuzzy Membership 
Function (FMF) y4,(@). The value of j14,;(0) denotes the 
degree of membership for @ in Ay. wa, : O — [0,1]; 01> 
wa,(@) € [0,1]. The sum of 24, might be equal to, greater 
than or less than 1. For O = {6),02,...,9,}, a fuzzy measure 
[48] is defined as 


1 n 
D(a,) = yD S(H, (84); (10) 
i=l 


where S(j14;(6;)) is Shannon’s function defined by 


S@)= ial (1—#)In(1—2), O<2<1, 


11 
0, x=0,1. ay 


In the field of fuzzy set theory, there are many related 
concepts including Z-number [49], intuitionistic fuzzy set 
[50] and Pythagorean fuzzy set [51] which describe fuzzy 
information from different aspects. Z-number [49] is defined 
as an ordered pair of fuzzy numbers containing the reliability 
of uncertain information and can be denoted as Z = (A, B). 
For the definition of a Z-number, A is a possibility restriction 
and B denotes the reliability of the possibility measure of A. 

Intuitionistic Fuzzy Fet [50] (IFS) is extended based on 
traditional fuzzy set. For 9 € ©, an IFS can be represented 
as A = {(0,wa(@),v4(0))|0 € O}. wa(P) : O — [0,1] is the 
membership function, which denotes the degree of belonging 
6 to A. v4(@) : © — [0,1] is the non-membership function, 
which denotes the degree of non-belonging @ to A. One has 
0 < pa(O) +v4(0) < 1, and 1 — wa(@) — v4(@) denotes the 
degree of hesitation. 

Pythagorean Fuzzy Set (PFS) [51] is a non-standard 
fuzzy subset, which can be represented as P = 
{(0, up(0),vp(O))|@ € O}. It satisfies 0 < (~wp(A))? + 
(vp(0))? < 1, where wp(@) : © — [0,1] denotes the 
membership function, and vp(?) : © — [0,1] denotes the 
non-membership function. 


III. TRANSFORMATIONS OF FMF INTO BBA 


Although fuzzy set theory and the theory of belief functions 
are two different theoretical frameworks, there are relation- 
ships between their basic concepts [52]. The relationships are 
between the FMF and the singleton plausibility function or 
singleton belief function, which are as follows. 


A. Relationships between FMF and BBA 


Consider the FOD is 0 = {0), 62,...,4,}. The given FMF 
is denoted by ps = [11(A1), 1(02),..-, 4(An)]. The correspond- 


ing BBA is denoted by m. When >; 44(0;) > 1, the FMF 
is equivalent to a singleton plausibility function: 


PU({;})= S> m(A) = 1(0)),V{0} ©. 


{Oi }NAAO 


(12) 


When the FMF is equivalent to a singleton plausibility 
function, it is also equivalent to a contour function. 

When 9>""_, (6;) < 1, it is equivalent to a singleton belief 
function: 


Bel({0;}) = m({0;}) = u(0:),¥{0i} C @. 


The detailed proof of two relationships is given in [52]. 

When 5>;"_, 4(0;) = 1, the FMF can be equivalent to either 
the singleton plausibility or singleton belief, because when 
yo, w(@;) = 1 Eg. (12) and Eq. (13) are equivalent. The 
proof of their equivalence is in the appendix. 

The transformation of an FMF into a BBA is a multi-answer 
problem [7]. There exists 2” — 1 focal elements of the FOD 
O = {61,02,...,4n}, except for Q. That is to say, at most 


(13) 
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2” — 1 unknown variables need to be determined. However, 
according to Eq. (12) or Eq. (13), one can obtain n linear 
equations about the undetermined BBA. In addition, one has 
ace mA) = 1, which is one of the legitimate conditions 
of a BBA. Then, using the n+ 1 linear equations to determine 
2” — 1 unknown variables is an under-determined problem 
when n > 2. Thus, the transformation of an FMF into a BBA 
is a multi-answer (multi-solution) problem. 

To deal with the multi-answer problem, various methods 
have been proposed [5]-[10], which can be categorized into 
two major types: the transformations with preselection of focal 
elements, and transformations based on uncertainty optimiza- 
tion. 


B. Available transformations with preselection of focal ele- 
ments 


1) Normalization based transformation: Bi et al. [6] trans- 
form an FMF into a BBA in the application of the text 
categorization. They normalize the given FMF to determine 
a unique BBA. Let the FOD be 0 = {06),62,...,0,}. The 
given FMF is denoted by pw = [1(@1), u(@2),..., “(On)]. The 
obtained BBA is denoted by mnpom. In the work of Bi et al. 
[6], the BBA is determined as follows: 

Maorm({O:}) = 1(6:)/ > 1(9;), (14) 
j=l 
where 7 = 1,2,...,n. In the sequel, this transformation is 
represented by “Thom”. By using Thom, one can obtain a 
Bayesian BBA, i.e., the focal elements are only singletons. 


2) Transformation based on  a-cut approach: 
Suppose that an FOD is O= {61,09,...,0n}. 
be = [u(01), u(02),.-.,4(An)] denotes the given FMF. 


Florea et al. [5] transform an FMF into a BBA by 
using a-cut approach, where yw should be sorted into 
ascending order. Here, we represent the sort of pe as 
0 = ao < a <... < aw < 1, where M < |O|. a; 
(j = 1,2,..., M4) is the value of the FMF. Then the BBA, 
denoted by ma-cut, 1s determined as 


Me-cut(Aj) = (aj — aj-1)/am, (15) 


where A; = {0 € Olu(O;) > aj}, i = 1,2,...,n and 
j = 1,2,...,M. In the sequel, the above transformation is 
represented by “Ty-cur”. By using Ty-cur, the supposed structure 
of focal elements for the obtained BBA is nested in order. 


3) Transformation based on assigning mass to a focal 
element triplet: Let the FOD be 0 = {6),62,...,0,}. The 
given FMF is denoted by ps = [u(A1), w(O2),..., 4(On)]. In 
[10], the authors use a focal element triplet, which is a 
structure defined as three focal elements B,, By and B3. 
B,, Bz C © are singletons, and Bs is the total set (i.e., ©). 
First, the normalization of the given FMF is calculated using 
m({9:}) = w(Gi)/ i _, w(O;) for 7 = 1,2,...,n. Then, the 
BBA represented by ™,,; can be obtained as follows: 


mri (Bt) _ m({9s}), Mui (Be) = m({O:}), 


mi(Bs) = 1—m({Os}) — m({A}) (16) 


where B,, Bz and Bz are defined by 


By = 0. = argmax(m({6;})), 
(m({0:})); 


Bo =6,=arg max 
0:€O\{Os} 


B; =8. 


This transformation is represented by “Tj; in the sequel. 


C. Transformations based on uncertainty optimization 


In our previous work [7], the multi-answer problem is 
formulated as a constrained optimization to obtain a unique 
BBA without preselecting focal elements. We established two 
transformations based on uncertainty optimization of an FMF 
into a BBA, where the uncertainty measure AM (see Eq. (5)) 
is used as the objective function. When an FMF is given, 
except for m(@) 4 0, at most 2” — 1 focal elements for the 
undetermined BBA need to assign the mass. According to the 
relationships between the given FMF and belief or plausibility 
function, together with the BBA legitimate conditions, there 
are 7m + 1 equations, which are used as the constraints. 
When the given FMF is equivalent to a singleton plausibility 
function, the corresponding constraint can also use the contour 
function. 

As analyzed in [7], using AM as the objective function has 
some disputes and limitations because it actually quantifies 
the randomness of the pignistic probability measure approx- 
imating a BBA, so it does not capture all the aspects of 
uncertainty (specially the ambiguity) represented by a BBA. 
AM has also been criticized in [36]. TU! is a total uncertainty 
measure without switching frameworks [36], which is based 
on Wasserstein distance [33] (a strict distance). Therefore, we 
replace AM with TU! (see Eq. (8)) as the objective function of 
the transformations based on uncertainty optimization in this 
paper. 

1) Transformation based on uncertainty minimization: 
The objective function of the transformation based on un- 
certainty minimization is TU’. The constraints are mainly 
based on the relationship defined Eq. (12) or Eq. (13). Sup- 
pose that the given FMF defined on 0 = {61,62,...,8n} is 
pb = [u(81), u(A2),-.-, “(An)]. The BBA, denoted by minin, 
can be obtained as follows: 

When 2”, p(8:) > 1, 


Mmin = arg min TU'(m) 
= arg min 1 — 3 Yay ((Bel((8.1) PIBDI: 0.1) 


_ (17) 
Yonazo MA) = u(Gi), VIA} S O, 
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When 7", 4(0;) <1, 


. I 
Min = argmin TU (m) 
m 


= argmini - 4 > aw (Bet((4:}),P1({6.})], 0.1) 
t=1 (18) 
m({i}) = u(:), V{@i} CO, 


Sit. Vace m(A) = 1, 
0< m(A) <1. 


In the sequel, the transformation based on uncertainty 
minimization is represented by “Tinin” for convenience. When 
So, w(9:) = 1, Eq. (17) and Eq. (18) are equivalent. 
Therefore when )>*"_, j4(0;) = 1, one can choose either Eq. 
(17) or Eq. (18) to do the transformation. The proof of their 
equivalence is in the Appendix. 


2) Transformation based on uncertainty maximization: 
The transformation based on the uncertainty maximization 
has the same objective function as that in Tinin. The corre- 
sponding constraints are also mainly based on the relationship 
between the given FMF and the BBA to determine. For the 
FOD 0 = {1, 02, Bane In}, b= [(1), L(92), ee , (An) is 
the given FMF. The obtained BBA is represented by Max. 
The objective function and corresponding constraints are as 
follows: 

When 2", p(0,) > 1, 


Mimax = argmax TU!(m) 


= argmaxc — 2 Yay ((Bel( 41), PU{4)), 0,1) 


(19) 
Vonage MA) = H(i), V{9i} SO, 


When 2 , 4(8,) <1, 


Mmax = arg max TU'(m) 


= argmaxt ~ 22 ay ((Bel({8:}),P1({0.})} 0,1) 
= (20) 


m({0:}) = w(0:), V{Ai} CO, 


st. Vi ace MA) = 1, 
0<m(A) <1. 


In the sequel, the transformation based on uncertainty 
maximization is represented by Tinax for convenience. When 
So, w(9:) = 1, one can choose either Eq. (19) or Eq. (20) 
to do the transformation, because when 57)"_, (0;) = 1, Eq. 
(19) and Eq. (20) are equivalent. The proof of their equivalence 
is in the Appendix. Although the available transformations 
can transform a given FMF into a BBA, there are still some 
limitations and problems described in the next section. 


D. Limitations of available transformations 


1) Limitations of transformations with preselection of focal 
elements: As referred previously, to deal with the under- 
determined problem for the transformation of an FMF into a 
BBA, preselecting focal elements or solving the optimization 
problem are used. Compared with solving the optimization 
problem (Tynin and Tinax), preselecting the focal elements 
without sufficient witness might bring the loss of information. 
Since the focal elements are preselected, the obtained BBA can 
only assign the mass to the focal elements specified beforehand 
(e.g., aS With Thom, To-cur OF Tiri). For the BBA obtained by 
using Thorm, there are no compound focal elements. The BBA 
obtained by using Ty-cut or Tii has a specific structure of focal 
elements for the given FMF. In addition, for different FMFs 
(such as in Example 1), the same BBAs might be obtained by 
using Thorm; Te-cut or Ty. 


2) Example I: Let the FOD be © = {61, 02, 03}. There are 
two FMFs py = [0.3,0.2,0.1] and po = [0.9,0.6, 0.3]. The 
BBAs obtained by Thom are as follows: 


Hi: m1({1}) = 1/2, mi({62}) = 1/3, mi ({93}) = 1/6, 


Hz: m2({A1}) = 1/2, m2({62}) = 1/3, ma({83}) = 1/6. 
Using To-curz, the BBAs obtained from jz; and pz are: 


Ha: m3({A1}) = 1/3, ms({1, 82}) = 1/3, m3(O) = 1/8, 


oz : ma({91}) = 1/3, ma({91,42}) = 1/3,m4(8) = 1/3. 
Using Ti,;, the BBAs obtained from jz; and pg are: 


1 : m5({91}) = 1/2, ms({02}) = 1/3,ms(8) = 1/6, 


M2: me({A1}) = 1/2, m6({62}) = 1/3, me(O) = 1/6. 


One can see that m, = ™m2, m3 = m4 and ms = me. 
This is not that rational. 42; and jz are completely different 
FMFs and have different uncertainty degrees. Using Eq. (10) 
to calculate the degrees of fuzziness for 4, and j42, one can 
verify that D(~1) = 0.6907 # D(p2) = 0.7737. That is to 
say, given two FMFs with different degrees of fuzziness, the 
obtained BBAs are respectively identical by using Thom, Ta-cut 
or Ti. 


3) Limitations of transformations based on uncertainty op- 
timization: The optimization-based transformations take every 
possible focal element into consideration to assign the mass, 
which avoid preselecting the focal elements and deal with 
multi-answer problem by solving optimization problem. How- 
ever, in the process of solving optimization problem, the two 
transformations based on uncertainty optimization consider the 
minimal and maximal uncertainty degrees respectively, which 
might lead to extreme attitudinal bias on the uncertainty degree 
and bring “one-sided” and counter-intuitive results. 
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4) Example 2: Suppose that the given FMF 
is 43 = [0.58,0.5,0.42]. The given FMF _ satisfies 
yy u(0,) = 1.5 > 1. Using Eq. (19), the BBA obtained by 
Lrnase is 


3 :m7({61}) = 0.09, mz7({@3}) = 0.17, 
mz7z({61, O2}) = 0.32, m7({03}) = 0.25, 
m7({61, 63}) = 0.16, m7(O) = 0.01. 


One can calculate pignistic probability of m7 with Eq. (6). 
BetP7(@,) = 1/3, BetP7(@2) = 1/3, BetP7(@3) = 1/3. 
As can be seen, 
BetP7(61) = BetP7(02) = BetP7(63). 


However, the values of 43 have obvious difference and thus 
this equation is not that rational. When only emphasizing the 
maximal uncertainty, a certain degree of difference might be 
ignored. 

Suppose that the given FMF is peg = [0.5001, 0.5, 0.4999], 
the BBA obtained by Tinin is as follows: 


Ha :ms({O1}) = 0.5, mg({1, A2}) = 0.0001, 
me({Oo, 03}) = 0.4999. 


The corresponding pignistic probability is BetPs(01) ~ 0.5, 
BetPs(02) © 0.25 and BetPs(@3) + 0.25. When only empha- 
sizing the minimal uncertainty, BetPg (61) is overemphasized. 

The optimization-based transformations emphasize the ex- 
treme cases of uncertainty and might bring counter-intuitive 
results. In fact, a trade-off, i.e., a more “moderate” (or bal- 
anced) BBA, is more natural than the obtention of BBA based 
on extreme (min, or max) strategies. To avoid being “one- 
sided”, we propose “moderate” transformations to obtain the 
trade-off BBA with an uncertainty between the minimal and 
maximal uncertainty as presented in the next section. 


IV. MODERATE TRANSFORMATIONS WITH WEIGHTING 
FACTOR 


To obtain a BBA with a trade-off or moderate uncertainty 
degree, one can use weighting factors, which can be user- 
specified or determined by some prior information. Consider- 
ing different preference or requirements of users, the value of 
the weighting factor can be determined according to the prior 
information or directly determined by the user. We use the 
weighting factor to transform an FMF into a trade-off BBA 
based on the two BBAs myjn and Mmpax, Where Min iS 
obtained by Trin, and Myax is obtained by Twrax. Let 6 
(0 < 6 < 1) be the weighting factor, the trade-off BBA 
satisfies the following conditions: 

(1) When  -> 0, the trade-off BBA becomes closer to Min; 
(2) When (6 —> 1, the trade-off BBA becomes closer to Max. 

Meanwhile, the trade-off BBA’s corresponding singleton 
belief or singleton plausibility should be equivalent to the 
given FMF. Here, we propose two transformations of an FMF 
into such a trade-off BBA. 


A. Weighted average based transformation 


Consider the frame of discernment 0 = {61,02,...,4n}. 
The given FMF is pt = [u(61), u(@2),..-, W(An)]. First, we 
calculate Min and Mmax by Tmin and Tmax, respectively. 
A trade-off BBA, denoted by mwa, is a weighted average of 
Mmin aNd Myax as Shown in Eq. (21): 


Mya = (1 — B)Mynin “fe BMmax; (21) 


where A C O. In the sequel, the transformation based on the 
Weighted Average (WA) is represented by “Ty”. 

The trade-off BBA obtained by Eq. (21) satisfies the follow- 
ing legitimate conditions of a BBA: 


S> mwa(A) = 1,and 0 < mya(A) < 1,VA CO. 
ACO 


In addition, the singleton belief or singleton plausibility for 
the trade-off BBA obtained by Ty, is equivalent to the given 
FMF. When 57"_, 1(0;) > 1, both mmin and Mmax satisfy 
Eq. (12). Then, we have Plmin({:}) = Plmax({6i}) = u(6:), 
a = 1,2,...,n, where Plyin and Plmax are the plausibility 
functions for Min and Mmax, respectively. According to Eq. 
(21), one can deduce that 


Plua({9i}) = > — mwa(A) 
{0;}NAZO 
= SS (= B)minin(A) + Brinax(A) 


{9:}NAFO 
= (1 _ B) a ™Mmin(A) 
{9 }NAAO 
+B >> mmax(A) (22) 
{9 }NAAD 


= (1 — B)Pliin({ 0; }) + BPlinax ({: }) 
= (1 — B)u(Gi) + Bu(Oi) = H(i) 
When >;"_, 4(8;) < 1, both miyin and Mmax satisfy Eq. 
(13). Then, we have Belmin({9:}) = Belmax({0:}) = u(6:), 
7=1,2,...,n, where Belin and Belmax are the plausibility 


functions for Mypin and Max, respectively. According to Eq. 
(21), one can deduce that 


Belwa({4i}) = S— mwa(A) 


A={0;} 
= x (1 > B) : Mmin(A) + B . Mmax(A) 
A={0;} 
= (1 _ B) : ys Mmin(A) 
A={6i} 
= B . oa Mmax(A) (23) 
A={0;} 


= (1 a B) > Belinin({9;}) + B : Belinax({9;}) 
= (1— 6) > w(0i) +B w(Gi) = w(4) 


According to Eq. (21), the trade-off BBA obtained by 
using Ty, conforms to the conditions aforementioned at the 
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beginning of Section IV. When 8 — 0, mya approximates to 
Mmin- When 8 + 1, mya approximates to Max. 

Beside the directly weighted approach, we can also use 
degree of dissimilarity between the trade-off BBA and the 
BBA obtained by using Mypin OF Mmax. Then, the weighting 
factor can influence the relationship between two degrees of 
dissimilarity to obtain the trade-off BBA. 


B. User-specified optimization based transformation 


We use the distance of evidence between two BBAs in 
this transformation. A trade-off BBA to determine is repre- 
sented by ™m,., where the subscript “uo” denotes the “user- 
specified optimization”. Min is obtained by Tinin. Mmax 
is obtained by Tinax. (Mu, Mmin) denotes the distance 
of evidence between my and Min, While d(1Muyo, Mmax) 
denotes the distance of evidence between my. and Max. 
When d(muo, Mmax) is larger, Myo is closer to Minin. When 
A(Muyo, Mmin) is larger, Myo is closer tO Mmax- 

Then, we can use the weighting factor G6 (0 < 8 < 1) to 
influence how close Myo is tO Mmpin OF Mmax. The relation- 
ship between two distances of evidence (ie., d(1Muo, Mmin) 
and d(Myo,Mmax)) is defined based on £ as follows: 


d(Mxyo, Mmin) = B 
d(muo, Mrmax) 7 


3 (24) 


Here, 6 can be regarded as the weight of d(muo, Mmin), 
and 1 — @ can be regarded as the weight of d(1muo, Mmax)- 
The visualized illustration of Eq. (24) can be shown in Fig. 1. 


am ,m_.) 


uo? “min: 


dm .m__) 


uo? max 


; m 
min uo max 


B 1-£ 


Figure 1. Illustration of Eq. (24). 


As we see in Fig. 1, when the value of 6 gradually 
decreases, d(7Myo,Mmin) gradually decreases. Meanwhile, 
the value of d(7Myo,Mmax) gradually increases. Then myo 
becomes similar to Min. Conversely, when the value of (6 
increases gradually, the value of d(7™uyo, min) increases and 
the value of d(7muyo,Mmax) decreases gradually. Then myo 
becomes similar to Mmax. However, a BBA strictly satisfied 
Eq. (24) might not always exist. Therefore, we rewrite Eq. 
(24) as 


obj (Muo) — [a _ B) : d(muo, ™Mmin) = B % d(mo, Mmax)|” (25) 


where the value range of 6 can be [0, 1]. 

Although Eq. (24) cannot strictly hold sometime, so long as 
the value of obj(7m,,) is small enough, Eq. (24) holds approx- 
imately. Then, we establish a minimization problem, whose 


objective function is Eq. (25), to obtain the trade-off BBA. The 
constraints of this minimization problem are mainly based on 
the given FMF. Consider an FOD 0 = {6),62,...,0,}, the 
given FMF is represented by pt = [u(A1), u(02),..., (On)]- 
Here, we use Jousselme’s distance (see Eq. (9)) to construct 
the objective function. 

When 2", (6) > 1, 


Mu = arg min [(1 _ B) “ djz(m, Mrnin) _ B : djz(m, ie 


Yonago MA) = H(:), V{8it CO, 
s.t. Dace m(A) = 1, 
0<m(A) <1. 
(26) 


When 52 4(0,) <1, 


Myo = argmin [(1 — 8) -dy(m, mmin) — B« dy (mM, Mmax)]* 


m({G:}) = (Gi), V{Ai} C O, 
s.t. 4 Di ace MA) =1, 
0<m(A) <1. 
(27) 


In the sequel, the user-specified optimization-based trans- 
formation is represented by T,. for convenience. When 
yo, H(9:) = 1, one can choose either Eq. (26) or Eq. (27) 
to do the transformation, because when 57)", 4(4;) = 1, Eq. 
(26) and Eq. (27) are equivalent. The proof of their equivalence 
is in the Appendix. 

We find that using different evidence distances to construct 
the objective function might transform a given FMF into 
different BBAs. But the difference between these BBAs is rela- 
tively small. Therefore, we use Jousselme’s distance, one of 
the representative evidence distances, as the objective function 
of Tyo in this paper. 

For convenience, we list all the transformations aforemen- 
tioned and their abbreviations in Table | together with the 
symbols of corresponding BBAs obtained. 


Table I 
TRANSFORMATIONS AND THEIR ABBREVIATIONS. 


Approach Description of Transformation 


Tagen Normalization based transformation Marporm 
cf Eq. (14) 
Transformation based on a-cut 
cf Eq. (15) 
Tri Transformation based on assigning mass 
to a focal element triplet, cf Eq. (16) 
Transformation based on uncertainty minimization 
cf Eq. (17) or Eq. (18) 
Transformation based on uncertainty maximization 
cf Eq. (19) or Eq. (20) 
Weighted average based transformation 
cf Eq. (21) 
Die User-specified optimization based transformation 
cf Eq. (26) or Eq. (27) 


Ta-cut ™Ma-cut 
Mei 


Tm in 


Mymin 


Tmax ™Mmmax 


Twa 


Mwa 


Muo 
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C. Example 3 for illustration 


Here, an illustrative example to show the computation 
procedure of our transformations is provided. Let the FOD be 
O = {61, 02, 03}. The given FMF is (1) = 0.9, u(@2) = 0.7, 
(03) = 0.3. The weighting factor is 8, (0 < 8 < 1). The 
possible focal elements for this example are A; = {6;}, 

= {62}, As = {01,02}, Ar = {03}, As = {1,63}, 
Ag = {62,43}, and Az = (S) 

First, Min and Max are calculated. Using Tinin and Trax 

we have 


Mmin(A,) = 0.8 
Mmin(A3) = 0.4 
Mmin(Ag) = 9-1 
Mmin(A7) = 9.2 


Mmax(A1) = 9.05 
Mmax(Ao) = 0-03 
Mmax(A3) = 0.62 
Po Ay) = 0.07 
max( Ay) = 0-18 
Mmax(A7) = 9.05 


Second, using Twa, wa can be obtained. For instance, if 
we choose ( = 0.8, then we have 


mwa(A1) = (1 — 0.8) x 0.3+0.8 x 0.05 = 0.1 
Mwa(Az) = 0.8 x 0.03 = 0.024 

mwya(A3) = (1 — 0.8) x 0.4 + 0.8 x 0.62 = 0.576 
Mya(Aa) = 0.8 x 0.07 = 0.056 

muwa(As) = 0.8 x 0.18 = 0.144 

mwa(Ag) = (1 — 0.8) x 0.1 = 0.02 

muya(A7) = (1 — 0.8) x 0.2 + 0.8 x 0.05 = 0.08 


According to ban u(0;) =1.9> 1, the given FMF is 
equivalent to the singleton plausibility. For the user-specified 
optimization-based transformations, Eq. (26) is used to trans- 
form the given FMF into a trade-off BBA. 
B) is djz(m, Mrin) = 


Muyo = arg min (1 — 


m(A1) + m(A3) + m(As) + m(A7) = “(A1) = 0.9 
m(A2) + m(A3) + m(Ag) + m(A7) = (02) = 0.6 
s.t. ¢ m(Aq) + m(As) + m(Ag) + m(Az7) = u(O3) = 0.3 
Dia m(4i) = 1, 
0< m(A;) < 1,1=1,2,...,7 


The trade-off BBAs can be obtained by using Tyo. Here, 
6 =0.8 and 


Muo(A1) = 0.1013 
Muyo(Azg) = 0.0390 
Myo (Az) = 0.5597 
Myo (Az) = 0.0610 
Myo(As) = 0.1377 
Myo(Az7) = 0.1013 


As expected, when 6 = 0.8, mya and Myo are closer to 
Mymax, Tespectively. 


V. EXAMPLES 


We use different approaches to transform an FMF into a 
BBA in this section. Some numerical examples are provided 
to illustrate the difference between different transformations 
(including all the transformations mentioned in Table I). In 
addition, a classification example is provided to compare our 
transformations with other transformations. 


A. Example 4 


This example is a revisiting of the two examples in II]-D2 
and III-D4 with © = {01,02,03}. As mentioned above, 
using transformations with preselection of focal elements 
might transform different FMFs (e.g., 1 = [0.3, 0.2, 0.1] and 
[42 = [0.9, 0.6, 0.3] in Example 1) into the same BBA. Then 
we need to know the results of using transformations based on 
uncertainty optimization and moderate transformations. Here, 
we use Tinin, Tmax, Zwa and Ty. to transform js; and pee 
into BBAs, respectively. The corresponding obtained BBAs 
are listed in Table II and Table III, respectively. The results of 
moderate transformations (i.e., Ty, and Jy.) are obtained with 
B=0.7. 

According to the results of each transformation listed in 
Tables II and III, it can be seen that the BBAs transformed 
from j4; and fla are different. Their degrees of uncertainty 
are different. That is, using optimization-based transformations 
can avoid transforming different FMFs into the same BBAs 
due to the preselection of focal elements. Although only the 
results of moderate transformations with 3 = 0.7 are listed, 
the trade-off BBAs transformed from jz; and stg would be 
different at any value of /. 

For 3 = [0.58, 0.5, 0.42] and pa = [0.5001, 0.5, 0.4999] in 
Example 2, the aforementioned results in Section II-D4 show 
that using transformations based on uncertainty optimization 
might lead to extreme attitudinal bias on the uncertainty degree 
during the process of solving optimization problem and bring 
counter-intuitive results. Can moderate transformations avoid 
such a bias? 

In this example, we use the moderate transformations to 
obtain BBAs from p13 and 4a, respectively. Then, we calculate 
the corresponding pignistic probability. The trade-off BBAs 
transformed from p23 and p44 are listed in Table IV and Table 
V, respectively. Here, 6 = 0.7. 
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Table II 
BBAS TRANSFORMED FROM Hi. 


te i = 0. 3, ee ies = 0. 2, te = 0: 12, ie = = 0.1, ariel AQ) = 0.28 
Myo (A1) = 0.3, Muyo (A2) = 0.2, Muo(A3) = 0.12, muo(A4) = 0.1, 


Muo(As) = 0.0335, Muo(Ag) = 0.1849, muo(A7) = 0.1779 


Table II 
BBAS TRANSFORMED FROM [12. 


A3 6 N, ‘ 
ine Ai )= = 0. i, a= = 0. 0364, ts Fee = 0.5636, mike Ad = 0.0636, mmax(A5) = 0.2364 
moa( At) = = 0.19, mwa(A2) = 0.0255, mwa(A3) = 0.4845, mwa(Aa) = 0.0445, 
= 0.1655, mwa(Ao) = 0.03, mwa(A7) = 0.06 
= 0.1922, muo(A2) = 0.0490, muo(Ag) = 0.4588, mMuo(Aa) = 0.0506, 


= 0.1572, muo(Ag) = 0.0005, muo(Az7) = 0.0917 


'htp] 


Table IV 
BBAS TRANSFORMED FROM p13. 


Approach 


2 2 3 
wal Aa) = 0.1579, ig Ag) = = 0.0451, MANE = 0.196, Hawa 
Muo(A1) = 0.2283, muo(Az) = 0.2355, muo(A3) = 0.1162, m0(Aa) = 


BetP(01), BetP(@2), BetP(63 
[0.4233, 0.3029, 0.2738] 


[0.3977, 0.3613, 0.2410] 


mMuo(As) = 0.1969, muo(Ag) = 0.1098, muo(Az) = 0.0386 


Table V 
BBAS TRANSFORMED FROM [4. 


BBA 


Tw a 


Tis Muyo(A1) = 0.3584, muo(Az) = 0.1393, muo(A3) = 0.0024, muo(Aa) = 0.0741, 
Muo(As) = 0.0675, Muo(Ag) = 0.2864, 170(A7) = 0.0719 


As can be seen in Table IV, the three values of the pignistic 
probability of each trade-off BBA are different instead of 
BetP7(0,) = BetP7(62) = BetP7(63) in Example 2. Using the 
moderate transformation can obtain a trade-off or balanced 
BBA to express even a small difference of the given FMF 
(e.g., 43) and bring a rational result. 

According to the results listed in Table V, there is no 
overemphasis on BetP(0,) (the pignistic probability in Exam- 
ple 2 is BetPs(0,) = 0.5, BetPg(@2) = 0.25 and BetPs(63) = 
0.25. One can transform such an FMF like jz4 into a trade- 
off BBA by using the moderate transformations to avoid the 
overemphasis caused by “one-sidedness” on the uncertainty 
degree. Even if the difference of values of jz4 is tiny, they are 
different and there exists ~14(01) > f44(02) > a(03). As can 
be seen in Table V, the trade-off BBAs obtained by using Twa 
and 7, can represent the tiny difference between the values 
of j44, respectively. 


B. Example 5 


This example is a revisiting of the illustrative example 
(Example 3) in Section IV-C. In addition to using the moderate 
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a(A a i 
fia Ax) = 0.0886, twa(As) = 0.0729, t1iga(Ag) = 0.3228, tea(A7) = 0.0156 


BetP(0;), BetP(@2), BetP(03 


[0.4167, 0.2917, 0.2916] 


[0.4173, 0.3077, 0.2750] 


transformations, we also use Thom, Jo-cur and Tj, to transform 
the given FMF pw = [0.9,0.7, 0.3] into a BBA. The obtained 
BBAs are listed in Table VI. Furthermore, the trade-off BBAs 
obtained by using moderate transformations with 6 = 0.4 (we 
did use 6 = 0.8 in Section IV-C) are listed in the same table. 

The corresponding degrees of uncertainty are also listed. 
Furthermore, we provide the degrees of uncertainty for the 
trade-off BBAs obtained with 6 = 0.8 as follows: 


TU! (ini, ) = 0.5785 
TU! (mMmax) = 0.6532 
TU! (mya) = 0.6458 
TU! (myo) = 0.6484 


As shown in Table VI, ™Mnpom only has singleton focal 
elements. The structure of focal elements for ™,-cu, is nested 
and that for 7; depends on the ordering of values of the given 
FMF. According to the results in Example 3 and Table VI, 
using the uncertainty optimization based transformations and 
moderate transformations can consider more focal elements 
to assign mass. Moreover, the focal element structure of the 
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obtained BBA is not fixed. When the FMF is given, using the 
transformations with preselection of focal elements can only 
obtain a BBA with a certain structure of focal elements. 

For the moderate transformations, the degrees of uncertainty 
of the trade-off BBAs are both between the minimal and 
maximal degrees of uncertainty (i.e., the degrees of uncertainty 
of Mmin and Mmax). When 6 = 0.4, the trade-off BBAs 
are closer to Mmin. Meanwhile, for the degree of uncertainty 
of the trade-off BBA, it is closer to the minimal degree of 
uncertainty. When @ = 0.8, the trade-off BBAs are closer to 
™Mmax- Contrary to the situation when § = 0.4, the degrees of 
uncertainty of the trade-off BBA are closer to the maximal 
degree of uncertainty when 8 = 0.8. It can be indicated 
that the obtained trade-off BBAs satisfy the conditions for 
the trade-off BBA mentioned at the beginning of Section IV. 
In addition, although there exist TU'(mpom) < TU!(™mmin), 
TU (moe-cur) < TU'(7mmin) and TU!(my) < TU! (ammin), 
the three BBAs Mporm, Ma-cur and My; do not satisfy the 
relationship as Eq. (12). 


C. Example 6 


Let the FOD be © = {61, 62, 03,04}. The given FMF is 
(01) = 0.4, (02) = 0.3, u(03) = 0.2, and (04) = 0.1. 
The weighting factor is 6, (0 < 6 <1). The possible focal 
elements of the obtained BBAs for this example are denoted as 
follows: Aj = {61}, Apo = {62}, A3 = {61,62}, A, = {63}, 
As = {61,63}, Ag = {62, 03}, Az = {61, 02, Os}, Ag = 
{04}, Ao = {61,04}, Aro = {92,04}, Ara = {01, 02, Oa}, 
A12 = {03,04}, Ai3 = {41,93,04}, Aia = {62, 63, 64}, and 
Ais = 0, 

Using all the transformations in Table I, one can transform 
the given FMF into a BBA. According to = u(O;) = 1, 
the FMF is equivalent to the singleton plausibility or singleton 
belief. The proof of this equivalent is in the appendix. The ob- 
tained BBAs and corresponding values of uncertainty degrees 
are listed in Table VII. For the moderate transformations, we 
just list the corresponding BBAs for 6 = 0.1 and 6 = 0.9. 

In Table VII, mnorm only has singleton focal elements. 
Me-cur has four nested focal elements. No matter how many 
elements the FOD contains, ™m; always has only three focal 
elements. According the criterion of preselection of each 
transformation with preselection of focal elements and the 
values of the given FMF, the corresponding BBA has a 
specific structure of focal elements. As can be seen, min, 
Mmax» Mwa and My in Table VII are identical. No matter 
what the value of 3 is, no matter which transformation is 
used, the trade-off BBAs are identical. This is normal in this 
particular example because Mmin = max. Although there exist 
TU\(mo-cur) > TU'(mmax) and TU!(mui) > TU'(172max), the 
two BBAs ma-cur and m+, do not satisfy the relationship as 
Eq. (12) or Eq. (13). 


D. Example 7 


Let the FOD be 0 = {6,62,03}. The given FMF is 
(0) = 0.3, w(62) = 0.3, and (03) = 0.3. The weighting 
factor is 8, (0 < 8 <1). The possible focal elements of the 


obtained BBAs for this example are denoted as follows: 
Ar = {01}, Ao = {62}, As = {61,00}, As = {65}, 
As = {61, 03}, Ag = {0o, 03}, and Ay =O. 

We use all the transformations in Table I to transform the 
given FMF into a BBA. The obtained BBAs and correspond- 
ing values of uncertainty are listed in Table VIII. According 
to Soe 1(0;) = 0.9 < 1, the given FMF is equivalent to the 
singleton belief. For Ty, and Tyo, we just list the corresponding 
results for 8 = 0.2 and 6 = 0.6. 

As shown in Table VII, mnorm only has singleton focal 
elements and the mass values are the same for all singleton 
focal elements. By using Ty-cur, O is the only focal element 
of the obtained BBA. This is because pu(0;) = pu(02) = 
(03) = 0.3. All mass values assigned to the total set 
means the most uncertain case. For mpi, ™Mui(Ai) = 0, but 
Myi(Az) = mMyi(A,) = 1/3. According to Ag = {62} and 
A, = {03}, there exists overemphasis on {62} and {63}. 
If the given FMF has more than two identical values and 
uses Ti, to obtain a BBA, the BBA would overemphasize 
two singletons when assigning mass, i.e., suppose there are 
more than two same values in the given FMF (e.g., (01) = 
u(82) = (63) = (04) = u(45)), two of them (e.g., (41) 
and 1(02)) are overemphasized. This is not that rational. 

More focal elements are considered to assign mass by using 
transformations based on uncertainty optimization and mod- 
erate transformations. However, Mpin Overemphasize A3 = 
{61,2} and do not assign mass to other compound focal ele- 
ments. Using moderate transformations can avoid overempha- 
sizing a certain focal element. In this example, Mya = Myo. 
When 2 = 0.2, Mya and Myo are closer to Min, respectively. 
The values of TU' for the corresponding trade-off BBAs are 
between the minimal and maximal values of TU! and closer 
to TU!(mmin). When 8 = 0.6, the trade-off BBAs are closer 
tO Mmax. The values of uncertainty degrees are between the 
minimal and maximal degrees of uncertainty and closer to 
TU (tii): 

In addition, although some of the degrees of uncertainty of 
the BBAs obtained by using transformations with preselection 
of focal elements are greater (or less) than the maximal degree 
of uncertainty (or the minimal degree of uncertainty), the three 
BBAS Mporm, Ma-cut aNd ™ do not satisfy the relationship 
as Eq. (13). 


E. Example § 


To verify the effectiveness for the moderate transformations 
of an FMF into a trade-off BBA, we compare the average 
accuracy of 300-run experiments of all the mentioned trans- 
formations in a classification problem. Note that we only 
aim to show the impact of different transformations on the 
classification results, rather than improve the classification 
accuracy. 

We use three datasets of open UCI database [53] including 
iris dataset, wheat seeds dataset and wine dataset. Iris dataset 
has 150 samples including 3 classes, each of which has 50 
samples. Every sample has 4 features and all data of 4 features 
for 150 samples are complete. Wheat seeds dataset has 210 
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Table VI 
OBTAINED BBAS IN EXAMPLES. 


Mea- atAay= = 2/9, Ma- cut(Ag) = = 4/9, Ma- cut(A7) = = 1/3 
myi(A1) = 0.4737, mui(A2) = 0.3684, mi(A7) = 0.1579 


= 0.2, mwa(Az) = 0.012, mwa(A3) = 0.488, mwa(Aa) = 0.028 
0.072, mwa(Ag) = 0.06, mwa(A7) = 0.14 


Muo (Ay 
Muo (As 


0.2061, muo(Az) = 0.0578, muo(As) = 0.4361, muo(A4) = 0.0168, 
= 0.0771, muo(Ag) = 0.0255, muo(Ar) = 0.1806 


Table VII 
OBTAINED BBAS IN EXAMPLE 6. 


Mnorm(A1 ) = 0.4, Mnorm(A2) = 0.3, Mnorm (Aa) = 0.2, ™Mnorm( As ) =0.1 
Me-cu(A1) = 0.25, Ma-cu(A3) = 0.25, Mea-cu(A7) = 0.25, Mo-cut(Ais) = 0.25 
Myi(A1) = 0.4, My (Az) = 0.3, my (Ais) = 0.2 

Mmin(A1) = 0.4, Mmin(A2) = 0.3, Mmin(Aa) = 0.2, Mmin(As) = 0.1 

Mmax(A1) = 0.4, Mmax(A2) = 0.3, mmax(Aa) = 0.2, ™Mmax(Ag) = 0.1 


Twa with 8 = 0.1 
Twa with 8 = 0.9 
Tro with 8 = 0.1 
Tro with 8 = 0.9 


mwa 
Muo( 
Muo( 


Mwa(A1) = 0.4, mwa(A2) = 0.3, mwa(Aa) = 0.2, mwa(Ag) = 0.1 
A1) = 0.4, mwa(A2) = 0.3, mwa(Aa) = 0.2, mwa(Ag) = 0.1 
‘A1) = 0.4, muo(Az) = 0.3, ™o(A4) = 0.2, muo(Ag) = 0.1, 
A1) = 0.4, muo( Az) = 0.3, muo(Aa) = 0.2, muo(Ag) = 0.1, 


Table VIII 
OBTAINED BBAS IN EXAMPLE 7. 


Approach 
= = — Pre =0, 0.4226 


Thorm 
To-cut 
Teri 

Trin Minin (Ay 
™Mmax (At 
mMwa(A1) = 
mwa(A1) 
Muo (Ai) 
Muo (Aq) 


Tmax 

Ta with 8 = 0.2 
Twa with 8 = 0.6 
Tyo with 8 = 0.2 
Tao with 8 = 0.6 


samples including 3 classes and each of which has 70 samples. 
Every sample has 7 features and all data of 7 features for 210 
samples are complete. Wine dataset has 178 samples including 
3 classes, each of which has 59, 71 and 48 samples. Every 
sample has 13 features and all data of 13 features for 178 
samples are complete. The transformation process is given. 

In this example, only iris dataset is used to illustrate the 
detailed process of classification and the four features are 
denoted by fi, fo, fs and f4. We randomly select 70% of 
samples from each class as samples of the training set and 
the test set consist of the rest samples. In the training set, the 
minimal values, average values and maximal values of each 
feature for each class are used as parameters. Then, the FMF 
can be defined as follows: 


am for min; < x; < ave;, 
ave; —min; 
_— ©; —mMax; : : , 
w(0;) = 4 EE, for ave; < x; < maxi, (28) 
0, otherwise. 


where i = 1,2,3,4. min; is the minimal value of Class j (6;, 
j = 1,2,3). ave; is the average value of the Class 7. max; is 
the maximal value of the Class j. 


= 0.3, mMmin(A2) = 0.3, Mmin(A3) = 0.1, Mmin(Aa) = 0.3 

= = 0:3, Mmax(A2) = 0.3, ™max (Aa) = 0.3, Mmax(A7) =0.1 
3, Mwa(A2) = 0.3, Mwa(A3) = 0.08, mwa(Aa) = 0.3, mwa(A7) = 0.02 
3, Mwa(A2) = 0.3, Mwa(Ag) = 0.04, mwa(Aa) = 0.3, Mwa(A7) = 0.06 
.3, Muo(A2) = 0.3, Muo(A3) = 0.08, Muo(A4) = 0.3, Muo(A7) = 0.02 
.3, Muyo (A2) = 0.3, Muo(A3) = 0.04, muo(A4) = 0.3, muo(A7) = 0.06 


In Table IX, the corresponding parameters of training set 
are listed. The test sample, randomly selected from the test 
set, is # = [5.6,3, 4.5, 1.5], which belongs to Class 2 (ie. 62). 
According to the test sample and the corresponding parameters 
listed in Table IX, the triangular fuzzy membership functions 
of each feature for each class are shown in Fig. 2. 

According to Eq. (28), the four corresponding FMFs are 
determined as follows: 


FMP, : 111(9) = 0.1446, 11(02) = 0.6171, (03) = 0.4344 
FMF, : fi2(01) = 0.1955, p2(02) = 0.6422, 2(03) = 0.9777 
FMF; : 143(01) = 0, 413(@2) = 0.7692, J13(03) = 0 

FMF, : j14(01) = 0, fo4(02) = 0.6481, pu4(03) = 0.1502 


Using all the transformations listed in Table 1, one can 
transform the above four FMFs into BBAs, respectively. Let 
the FOD be © = {6,062,063}. The possible focal elements 
are represented as follows: Ay = {61}, Ao = {62}, As = 
{61, O2}, Ag _ {63}, As = {61,03}, Ag = {62,03}, and 
A7=0. 

The corresponding BBAs obtained by using transformations 
with preselection of focal elements, transformations based 
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Table IX 
PARAMETERS OF TRAINING SET. 


Feature Class 1 (01) 

min, ave, max, ming 
fi 4.3 5.0086 5.7 4.9 
fe 2.9 3.4114 44 2.2 
fs 1 1.4571 1.9 3 
fa 0.1 0.2571 0.6 1 


Value of FMF 


Ist dimension (cm) 
(a) FMF of f, 


Value of FMF 


1 2 3 4 5 6 7 
3rd dimension (cm) 
(c) FMF of f, 
--- Class 1 -- Class 2 


Class 2 (82) 


Class 3 (83) 


aveg max2 ming = aveg max3 
6.0343 7 4.9 6.5114 7.9 
2.771 3.4 2.5 3.0117 3.8 
4 eal 4.5 5.5314 6.9 
1.3371 1.8 1.4 2.0657 2.5 


Value of FMF 


2.5 3.0 3:5 


2nd dimension (cm) 
(b) FMF of f, 


4.0 


Value of FMF 


0 0.5 1.0 Us 
4th dimension (cm) 


(d) FMF of /, 
— Sample 


2.5 


— Class 3 


Figure 2. FMFs of four features. 


on uncertainty optimization and moderate transformations 
are listed in Table X. We use moderate transformations to 
determine trade-off BBAs with 6 = 0.2. 

When the BBAs of each feature are determined, one can 
combine the corresponding four BBAs of each transformation. 
For convenience and simplicity, Dempster’s rule of combina- 
tion is used. The combined BBAs of each transformation are 
listed in Table XI. 

According to the results of Table XI, the corresponding 
pignistic probabilities can be calculated by using Eq. (6). 
As we can see in Table XII, the values of BetP(@2) for 
all the transformations are the largest, which means that the 
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classifications are correct. 


In order to compare the moderate transformations with the 
traditional transformations, we provide 300-run experiments 
on three datasets (including iris dataset, wheat seeds 
dataset and wine dataset) to obtain the average accuracy, 
respectively. All the transformations in Table I are used and 
the transformation process is the same as above. We use 
all the features in iris dataset. In wheat seeds dataset and 
wine dataset, we randomly select 4 features and 7 features to 
classify, respectively. Reducing the feature dimensions used 
for classification is to simplify the experiment process. 
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Table X 
OBTAINED BBAS IN EXAMPLE 8. 


Approach Feature BBA 


enmal At) = = 0.1077, Mnorm(A2) = 0.3538, Pent la= = 0.5385 
Mnorm(A2) = =1 
Zhnom (#3) = = 0.8118, mnorm(Aa) = 0.1882 


6 eut(A7) = 
Mea- = -_ 0. 3431, Meo- a Ae = 0. 4569, Ma-cut(A7) = 0. 2 
Mea- cut(Ag) = 

Ma-cu(A2) = - 7682, Ma-cu(Ag) = 0.2318 


mui(A2) = 0. 3538, He Fie = 0.5385, mui(A7) = 0.1077 
Myi(Az) = 
Mri (A2) = a 8118, mui(Aa) = 0.1882 
Mmin( Aa) = 0.2383, Mmin( 
Mmin(Aa) = 0.3578, Mmin( 
ee = 0.1887, mmin 


( = = 0.0421 
Mmin(Aa) = 0.1502, mmin( 


poe 


5 
6 
6 
+ 


A 
A 
A 
A 


2 1) max 2 ax 2 
™Mmax(A1) = 0. O115, ™Mmax (Ag) = 0. 0069, hee Fn = 0. 1623, ian 
Mmax(Az) = 0.7692, Mmax(As) = 0.2308 
mnnax(Az) = = 0.6481, mnngx(Aa) = = 0.1502, mna(As) = = 0.1739, mmax(A7) = 0.0278 


mwa(A) = = 0.0031, mvya(Az) = 0.0014, mwya(Ag) = 0.0179, muwa(A4) = 0.3187, 
5) = 0.036, mwa(Ag) = 0.4844, mwa(A7) = 0.1385 
) = 0.7692, mwa(A3) = 0.1509, mwa(As) = 0.0348, mwa(Ag) = 0.0337 
) = 0.6481, rmya(Aa) = 0.1502, mya(A5) = 0.0348, mya(Ag) = 0.1613, mua(A7) = 0.0056 


aa : 
rrimo( Ag) = 0.0223, rno(Aa) = 0.3188, muo(As) = 0.0393, ro(Ac) = 0.486, muo(A7) = 0.1339 
Muo (Az) = 0.7692, Muo(A3) = 0.1383, muo(As) = 0.0002, muo(Ag) = 0.0923 
Muo(Ag) = 0.6481, muo(A3) = 0.0421, muo(A4) = 0.1506, muo(Ag) = 0.1592 


Table XI 
COMBINED BBAS IN TABLE X. 


Approach 


Mmin(Az) = 0.9816, mmin(A4) = 0.0176, Mmin(Ag) = 0.0008 
Mmax(A1) = 0.0054, mmax(Az) = 0.8520, mmax(A4) = 0.1426 
mwa(A1) = 0.0006, mwa(A2) = 0.9621, mya(A4) = 0.0363, mwa(Ag) = 0.0001, mwa(A7) = 0.0009 
muo(A1) = 0.0005, muo(Az) = 0.9646, T0(A4) = 0.0328, muo(Ag) = 0.0021 


Table XII 
PIGNISTIC PROBABILITIES IN EXAMPLE 8. 
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Table XIII 
AVERAGE CLASSIFICATION ACCURACY (%). 


Approash 


On each run, 70% of each class samples of iris dataset 
and wheat seeds dataset are for training samples, and the rest 
samples are for testing. For wine dataset, 34 samples (the class 
with the smallest samples contains 48 samples, 70% of which 
is about 34) of each class samples are for training samples 
and the rest samples are for testing. The training samples 
are select randomly. For moderate transformations, we specify 
6 = 0.2, 0.5, 0.8 to obtain the trade-off BBA, respectively. The 
results for average classification accuracy of three datasets are 
listed in Table XIII. 

According to the results in Table XIII, there is a gap 
between the classification accuracy we obtained and the best 
possible classification accuracy for each dataset (e.g., for iris, 
the best possible accuracy with other classification approach 
can be beyond 95%). Here, we only aim to compare the 
impact of different BBA transformations on the classification 
performance. 

All the results based on optimization-based transformations 
are better than those based on transformations with preselec- 
tion of focal elements, i.e., considering more possible focal 
elements might reduce the loss of information due to the 
preselection of focal elements, thereby improving the clas- 
sification accuracy. Meanwhile, the moderate transformations 
achieve higher classification accuracy than other transforma- 
tions. The moderate transformations do not pursue the minimal 
or maximal degree of uncertainty on the basis of considering 
all possible focal elements, since the extreme attitudinal bias 
on the uncertainty degree might bring counter-intuitive results 
and a moderate (or balanced) BBA without the minimal or 
maximal degree of uncertainty is more natural. 

Besides, we note two cases of samples of 300-run experi- 
ments: 

e Case 1: the classification results of transformations based 
on uncertainty optimization are wrong and that of mod- 
erate transformations is correct. 

e Case 2: the classification results of moderate transfor- 
mations are wrong and that of transformations based on 
uncertainty optimization is correct. 

In this example, the test sets of three datasets have 45, 
63 and 76 samples, respectively. Here, we count the number 
of samples for Case | in each run experiment and calculate 
the average. The average numbers of samples of Case 1 are 


2.2167 (4.93%), 1.8833 (3.00%) and 4.08 (5.37%) for three 
datasets, respectively, i.e., the moderate transformations can 
bring better results. We find that the samples belonging to 
Case | in each dataset contain at least one dimension feature 
with small difference in values of different classes. Compared 
with the transformations based on uncertainty optimization, the 
moderate transformations can better represent the uncertainty 
contained in the FMFs obtained according to the samples, 
e.g., the samples marked in Fig. 3 are the samples of Case 
1 in iris dataset after 300-run experiments (repetitive samples 
are marked only once). In Fig. 3, samples of Class | are 
marked in red points; samples of Class 2 are marked in blue 
solid diamonds; samples of Class 3 are marked in cyan solid 
triangles. We use red circles to mark the samples of Case | of 
Class 1; we use blue diamonds to mark the samples of Case 
1 of Class 2; we use cyan triangles to mark the samples of 
Case 1 of Class 3. 


As we can see in Fig. 3(b), the samples of Class | can 
be clearly distinguished from the samples of the other two 
classes. However, the values of f; and f2 of Class 1 have 
small difference with the other two classes. Taking a sample 
ax = [7,3.2,4.7, 1.4] of Case 1 of iris dataset as an example, 
this test sample belongs to Class 2. The corresponding FMFs 
are: 


FMF; : 41(01) = 0, 1 (82) = 0, 1 (83) = 0.6618 

FMF> : 42(81) = 0.8119, 12(02) = 0.3017, 19(63) = 0.7095 
FMF3 : 13(61) = 0, w3(2) = 0.4389, 3(@3) = 0 

FMF, : j14(61) = 0, ju4(02) = 0.7349, pr4(63) = 0.0001 


According to these four FMFs, we can transform them into 
BBAs, respectively. The results are listed in Table XIV. 


One can combine the corresponding four BBAs of each 
transformation by using Dempster’s rule of combination and 
then obtain the following results by using Eq. (6), yielding 


BetPmin(01) = 0.2427 
BetPmin(82) = 0.3426 
BetPmin(63) = 0.4147 
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f 


(a) 1st and 2nd dimensions 


(b) 3rd and 4th dimensions 


© Samples of Case | of Class 1 
Samples of Case 1 of Class 2 
Samples of Case 1 of Class 3 


¢ All samples of Class 1 
+ All samples of Class 2 
» All samples of Class 3 


Figure 3. Samples of Case | of iris dataset. 


Table XIV 
OBTAINED BBAS OF A SAMPLE OF CASE | IN EXAMPLE 8. 


Approach Feature BBA 


0.4077, mmin(A6) = 0.1881, Mmin(A7) = 0.1136 
0.3351, mmin(Ae) = 0.2260 
0.1417, Tmmin(Ag) = 0.1189 


) . 
ine Fee = 0. 2906, Mrmin ( 
mmin(A2) = 0.4389, T™min( 
Honk Sah = 0.7394, Mmin ( 


E 4 7 
d= = 0.0941, mmax(Az) = 0.1329, ae = = 0.0636, 7mex(A4) = 0.0553, mmax(As) = 0.5489, rmax(A7) = 0.1053 
2) = 0.4389, mmax(As) = 0.2194, ™mmax(A7) = 0.3417 


2) = 0.7395, ue ) = 0.2605 


= =f, 0664, mwa 
= 0.0941, mwa 
= 0.1676, mwa 
= 0.0708, mwa 


0.0264, muo( 
0.1440, muo( 
0.0574, muo( 


BetPmax(01) = 0.2372 
BetPmax (02) = 0.3512 
BetPmax(03) = 0.4116 


BetPya(01) = 0.1927 
BetPya(02) = 0.4161 
BetPya(03) = 0.3912 


BetP,o(0,) = 0.1282 
BetP,o(02) = 0.4643 
BetP,o(03) = 0.4075 


muwa(Aa) = 0.0277, 


mMwa(A6) = 0.1130, mwa(A7) = 0.1708 
3 maa( As) = = 0.0594 


1537, ™muo(Ag) = 0.1907, muo(A7) = 0.0727 
.0973, Muo(Ag) = 0.0032, muo(A7) = 0.1027 


We can see that the classification results are correct except 
for the results obtained using Tinin and Tinax. 


On the other hand, the average numbers of samples for Case 
2 (the moderate transformations bring out incorrect results, 
while Tinin and Tinax bring out correct results) in 300-run exper- 
iment are 0.07 (0.16%), 0.2967 (0.47%) and 0.1433 (0.19%) 
for three datasets, respectively. Compared with the average 
numbers of samples for Case 1, moderate transformations are 
better than transformations based on uncertainty optimization. 
Using moderate transformations can avoid “one-sidedness” in 
terms of the uncertainty degrees. Meanwhile, the trade-off 
BBA can represent the small differences between the values 
of a given FMF. In summary, the moderate transformations 
can bring better classification results compared with that of 
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the transformations based on uncertainty optimization in a 
statistical sense. 


VI. CONCLUSIONS 


In this paper, we propose two transformations with a 
weighting factor to transform a given FMF into a trade-off or 
moderate BBA. The weighting factor could be determined by 
using prior information or user-specified to reflect the objective 
situation or meet subjective preferences of users. Numerical 
examples and classification results validate the effectiveness 
of the two moderate transformations. Comparing these two 
transformations, the computational complexity of Ty, is lower, 
and Jy. can bring a better classification performance. In 
practical applications, users can choose Ty or Ty. according 
to the demands of applications. Note that our transformations 
have been evaluated through some numerical and classification 
examples, within which, the design of numerical examples is 
usually subjective. In fact, the related fields of belief func- 
tions, including the generation of a BBA, lack objective and 
reasonable evaluation criteria. The conclusions obtained by 
numerical examples are incomplete. An objective evaluation 
criterion can help to obtain better related tools or approaches 
to deal with belief functions. Therefore, we will focus on 
the objective evaluation criteria of the belief functions in 
our future work. With the increase of FOD’s cardinality, the 
possible focal elements in a BBA will grow exponentially, 
i.e., the unknown variables that need to be determined in 
our formulated optimization problem will grow exponentially. 
The exponential growth of computational complexity is a 
limitation of our transformations. In the future work, we will 
attempt to use more simple and effective approaches beside 
the optimization-based transformations to transform an FMF 
into a trade-off BBA. 
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APPENDIX 


Proof of the equivalence between Eq. (12) and Eq. (13) when 
ae u(O;) = 1. 


Consider the FOD 0 = {0}, 62,...,0,}. The given FMF is 
denoted by pt = [1u(A1), u(2),..-, w(On)]. According to Eq. 
(12), the FMF is equivalent to a singleton plausibility and then 
we have the following n equations. 


m({61}) + m({O1, A2}) +... + m({O1, On}) + m({A1, O2, O3}) 
Sea ae m({61, On=1; On } Syrt auise. hr m(Q) = (01) 
m({O2}) + m({O1, A2}) +... + m({O2, On}) + m({O1, O2, 03 }) 


OF eeeeair m({O2, On-1, On} Fee ae m(Q) => L(02) 


m({8n}) + m({61, 02}) +...+ m({On—1, On }) + m({41, 02, On}) 


+...4+ m({On—2, On—1,On}) +... +m(O) = (On) 
(A.1) 


By adding the left and right sides of these n equations, then 


m({61}) + m({O2}) +... + m({On}) 

+2[m({61, 62}) +... + m({On-1, On})] 
+3[m/({O1, 02, 03}) +... 

+m({61, 2, On}) +... + m({On—2, On—1, On})] 
+...+nm(O) = u(01) + u(O2) +... + (On) = 1 


(A.2) 


According to 7 4-¢@ m(A) = 1, Eq. (A.2) can be rewritten 
as follows: ~ 

m({01,02}) +... + m({On—1,n}) + 2lmn({01, 02, 03}) 
ie m({61, 02, On}) 
APs SP m({On—2, ae 6, })] 
+...+(n—1)m(O) =0 

(A.3) 

Because 0 < m(A) < 1, (A C ©). This means that the 


focal elements at the left side of Eq. (A.3) are 0. Then we 
have 


m({O1, O2}) =... = M({On-1, nf) = m({A1, 92, 63 }) 
=...= M({On—-2, On-1, O9n}) =... = m(O) 
=0 (A.4) 
Then, Eq. (A.1) can be rewritten as. 
m({1}) = 4(A1) 
haan = (02) es) 


m({On}) = L(On) 


which means that the FMF is also equivalent to both the 
singleton belief function and singleton plausibility function 


when )>y_, #(6;) = 1. 
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Abstract—This paper introduces the concept of cross-entropy 
and relative entropy of two basic belief assignments. It is based 
on the new entropy measure presented recently. We prove that 
the cross-entropy satisfies a generalized Gibbs-alike inequality 
from which a generalized Kullback-Leibler divergence measure 
can be established in the framework of belief functions. We show 
on a simple illustrating example how these concepts can be used 
for decision-making under uncertainty. 


Keywords: generalized entropy, cross-entropy, relative en- 
tropy, Kullback-Leibler divergence, belief functions 


I. INTRODUCTION 


In Shannon’s theory of communication developed in 1948 
[1], [2], the measure of uncertainty (MoU), also called en- 
tropy, for characterizing a source of information (from signal 
transmission standpoint) is defined by Shannon entropy. This 
entropy measures the randomness of a probability distribution 
P and is usually noted by H(P). Shannon entropy does not 
concern the semantic aspects of the content of a message 
but only its transmission [3]-[5]. H(P) has played a very 
important role in the development of modern communication 
systems and cryptography [6] until today. According to Cover 
and Thomas [7], the cross-entropy denoted by H(P,Q) is 
the average number of bits needed to encode data coming 
from a source with a probability distribution P when we 
use a distribution model Q to define our codebook. Cross- 
entropy is commonly used in machine learning as a loss 
function [8], and the cross-entropy method is often used in 
practice to estimate an unknown true pmf (probability mass 
function) based on a test set where @ is the assumed (or 
eventually empirical) pmf model. The minimization of the 
cross-entropy is related with the principle of the maximization 
of the likelihood. That is why cross-entropy plays a major role 
in many statistical applications. The relative entropy, often 
referred as Kullber-Leibler divergence [9], is the difference 
between the cross-entropy and Shannon entropy, and so it is 
H(P,Q) — H(P). All these aforementioned basic concepts 
have been well established (and strongly justified) from the 
mid of 20th century, and all use the theory of probability as 
the fundamental underlying mathematical framework. 

In this paper we go beyond the classical probabilistic 
framework because we want to work possibly with epistemic 
uncertainty represented by non-probabilistic models thanks to 
the mathematical framework of belief functions introduced by 
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Shafer [10], and in this context the legitimate and important 
question is to know if it is possible, or not, to extend the 
concepts of entropy, cross-entropy and relative entropy for 
the belief functions. Concerning the concept of entropy, the 
answer is affirmative and very recently a new generalized 
entropy measure has been proposed in [11] in the framework of 
the theory of belief functions. Concerning the second and third 
theoretical questions about cross-entropy and relative entropy 
concepts, we give new comprehensive and better answers to 
these questions in this paper. This is our new theoretical 
contribution in the field. The concrete meaning of relative 
entropy and cross-entropy measures in the belief functions 
framework is a challenging question because the entropy of 
belief function is merely related to the uncertainty of epistemic 
knowledge rather than of statistical knowledge. No concrete 
meaning of these notions has been firmly established so far. 
This interesting open question is left for future research works. 

To make the material of this paper quite self-contained, we 
recall the basic classical concepts related to entropy (Shannon 
entropy, cross-entropy, and relative entropy) in the section II, 
and we present the basics of belief functions [10] in Section 
III with the new concept of entropy measure of basic belief 
mass assignment (BBA) [11] in the section IV. After recalling 
a very recent definition of cross-entropy of BBAs [12] based 
on the non effective Deng’s entropy definition [13], we present 
in the section V a new cross-entropy definition based on our 
new effective entropy definition. The section VI presents the 
concept of relative entropy of BBAs which can be interpreted 
as a generalization of the Kullback-Leibler divergence measure 
for belief functions. An example of the use of these concepts 
for decision-making under uncertainty is given in the section 
VII. Concluding remarks and perspectives are given in the 
section VIII. 


II. CLASSICAL NOTIONS RELATED TO ENTROPY 
A. Shannon entropy 


Consider a discrete random variable @ represented by a 
probability mass function (pmf) Py = (p1, p2,.--,pN), where 
pi; = P(0;) is the probability of the i-th state 6; (ie. 
outcome) of O = {61, 62,...,@x }. Shannon was interested in 
communication systems where the various events were the 
carriers of coded messages, and he did justify his entropy 
measure as appropriate measure of average uncertainty (or 
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measure of randomness) of a random variable [1], [2], [6], 
[7]. The entropy of a random variable is the average level 
of surprisal, or uncertainty inherent in the variable’s possible 
outcomes [14]. Shannon entropy is defined by! 


|9| 


H(Pw) *- PG 


By convention, P(6;)log(P(0;)) = 0 if P(0;) = 0. This is 
easily justified by continuity because lim,_,9+ clogz = 0, 
which can be proved using L’H6pital rule [15]. Adding terms 
of zero probability does not change the entropy value. In (1) 
we use the natural logarithm (i.e. base e logarithm) and in this 
case the Shannon entropy value is expressed in nats unity. We 
can also use the base 2 logarithm (log,) function instead of 
the natural logarithm, and if so the Shannon entropy value will 
be expressed in bits. Shannon entropy can be interpreted as a 
generalization of Hartley entropy (1928) [16] when presuming 
the pmf of equally probable states (i.e. uniform? pmf P¥?), 
hence getting H( Px") = log(|@|) = log(). Note that if we 
have a uniform pmf Px? defined on © with |O| = N and 
another uniform pmf Pt"! defined on 0’ with |O’| = N’, and 
if |O| < |O’| then H( Pu") < H(Pxt) because log(|O|) < 
log(|O’|) since log(z) is an increasing function. The minimum 
value of Shannon entropy is zero, which characterizes a non- 
random (or sure) event 0; for which P(6;) = 1. 

The main algebraic properties of Shannon entropy are, 
see [17] (p. 30) for details: the symmetry, the normality’, 
expansibility, decisivity, sub-additivity and recursivity. We 
recall that Shannon entropy value H(Py) is always smaller 
than H(P%"t) if Py #4 Pm’, expressing the fact that the 
uniform pmf is the only pmf giving the maximal Shannon 
entropy value, and characterizing the maximum of uncertainty 
(or randomness), which is called the maximality property. 


)log(P(@)) . (1) 


B. The cross-entropy 


Consider a finite set of exhaustive events 0 = {61,...,0,} 
where 9; are mutually exclusive (ie. 0:96; = 0 if i # J). 
Suppose that P = {P(6i1) = pi,...,P(0n) = pn} is a 
probability distribution over the set 0. Then for any other 
probability distribution Q = {Q(61) = m,.--,Q(@n) = dn} 
the Gibbs inequality holds [18] 


~ Yo pstos(a) (ai) = >= posto) (pi). (2) 


The cross-entropy between probability distriputions P and Q 
over the same underlying set of events O is defined by 


->> P(X 


--dp log( qi) = 
xXEO 


One can easily verify that H(P,Q) = H(P) when Q = P, 
i.e. when the probability distribution Q coincides with the 


)log(Q(X)). (3) 


'The symbol = means equal by definition. 
*for which P(0;) = 1/N fori =1,2...,N. 
>This stipulates that H(P3™') = 1 using base 2 logarithm function in (1). 


true probability distribution P the cross-entropy value equals 
Shannon entropy of P. Gibbs inequality is H(P,Q) > H(P). 


C. The relative entropy 


The difference between the cross-entropy H(P,Q) and 
Shannon entropy H(P) is named the relative entropy or the 
Kullback- Leibler (KL) divergence [9]), and is often denoted 
by* 

Dxx(P || Q) 


2 H(P,Q) — = So pi log(pi/qi). (4) 


Dxx(P||Q) measures how the probability distribution P is 
different from a second, reference probability distribution Q. 
It corresponds to the expectation of the logarithmic difference 
between the probability distributions P and Q, where the 
expectation is taken using the distribution P. In general the rel- 
ative entropy Dx,(P || Q) is not symmetric under interchange 
of the distributions P and Q and we have Dxz(P || Q) 4 
Dxx(Q || P). Therefore, Dx, is not strictly a distance even if 
it is often abusively called a distance in the literature, even by 
Cover in [7]. This relative entropy (i.e. divergence measure) 
is important in pattern recognition and neural networks for 
making classification, as well as in information theory. Kull- 
back and Leibler also proposed a symmetrized measure in 
[9] defined as Dx (P || Q) + Dxx(Q || P). Another renown 
symmetric version of the KL divergence is the Jensen-Shannon 
(JS) divergence defined by Lin in [19] 


 (Pu=s?) 


+ 5 Dx (2 Tl ad) . 6) 


Dis(P || Q) & 5D 


The Jensen-Shannon divergence can be interpreted as the 
total Kullback-Leibler divergence to the average probability 
distribution (P + Q)/2. This JS divergence is often used in 
practice because its square root is a metric often referred to 
as Jensen-Shannon distance [20], that is 


dis(P, Q) = VDss(P || Q). (6) 


Jensen-Shannon divergence has been applied in different 
fields of applications (e.g. bioinformatics, social sciences, fire 
experiments, machine learning, in deep learning for studying 
generative adversarial networks, etc), see [21]. 


III. BELIEF FUNCTIONS 


The belief functions (BF) were introduced by Shafer [10] for 
modeling epistemic uncertainty, reasoning about uncertainty 
and combining distinct sources of evidence. The answer of 
the problem under concern is assumed to belong to a known 
finite discrete frame of discernement (FoD) 0 = {61,...,4n} 
where all elements (i.e. members) of © are exhaustive and 
exclusive. The set of all subsets of O (including empty set 


4As in [7] (p. 19), in the formula (4) we use the conventions that 
Olog(0/0) = 0, Olog(0/q) = 0, and plog(p/0) = oo. So, if there is 
any X € © such that P(X) > 0 and Q(X) = 0, Dkr (P||Q) = 
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), and ©) is the power-set of © denoted by 2°. The number 
of elements (i.e. the cardinality) of the power-set is 2!©!, A 
(normalized) basic belief assignment (BBA) associated with a 
given source of evidence is a mapping m°(-) : 2° — [0,1] 
such that? m°(0) = 0 and S>y~50m°(X) = 1. A BBA 
m®(-) characterizes a source of evidence related with a FoD 
©. For notation shorthand, we can omit the superscript O in 
m®(-) notation if there is no ambiguity on the FoD we work 
with. The quantity m(X) is called the mass of belief of X. 
X € 2° is called a focal element (FE) of m/(-) if m(X) > 0. 
The set of all focal elements of m(-) is denoted by Fe(m) = 
{X € 2°|m(X) > 0}. The belief and the plausibility of X 
are respectively defined for any X € 2° by [10] 


Bel(X)= > m(Y), (7) 
Ye29|YCXx 
PI(X) = m(Y) =1-—Bel(X), (8) 


YE29|xXny#4o 


where X = © \ {X} is the complement of X in 0. 

One has always 0 < Bel(X) < PI(X) <1, see [10]. 
For X =@, Bel(@) = Pl(@) =0, and for X = © one has 
Bel(O) = PI(Q) =1. Bel(X) and PI(X) are often inter- 
preted as the lower and upper bounds of unknown prob- 
ability P(X) of X, that is Bel(X) < P(X) < PI(X). 
To quantify the uncertainty (i.e. the imprecision) of 
P(X) € [Bel(X), PI(X)], we use u(X) € [0,1] defined by 


u(X) & Pl(X) — Bel(X). (9) 


The quantity u(X) = O if Bel(X)=PI(X) which 
means that P(X) is known precisely, and one has 
P(X) = Bel(X) = PI(X). One has u(@) =0 be- 
cause Bel()) = P1(0) =0, and one has u(©) = 0 because 
Bel(©) = PI(Q) = 1. If all focal elements of m/(-) are single- 
tons of 2° the BBA m/(-) is a Bayesian BBA because VX € 2° 
one has Bel(X) = PI(X) = P(X) and u(X) = 0. Hence 
the belief and plausibility of X coincide with a probability 
measure P(X) defined on the FoD 0. The vacuous BBA 
characterizing a totally ignorant source of evidence is defined 
by m,(X) =1 for X = 0, and m,(X) =0 for all X € 2° 
different of ©. This very particular BBA plays a major role 
in the establishment of a new effective measure of uncertainty 
for BBA. 


IV. ENTROPY OF BASIC BELIEF ASSIGNMENTS 


In [22] we did analyze in details forty-five measures of 
uncertainty (MoU) of BBAs by covering 40 years of research 
works on this topic. Some of these MoUs capture only a par- 
ticular aspect of the uncertainty inherent to a BBA (typically, 
the non-specificity and the conflict). Other MoUs propose a 
total uncertainty measure to capture jointly several aspects of 
the uncertainty. Unfortunately, most of these MoUs fail to 
satisfy four very simple reasonable and essential desiderata, 
and so they cannot be considered as really effective and 


5In Shafer’s theory of BFs we work with a closed FoD and the mass of 
the empty set must always be equal to zero. 


useful. Actually only five MoUs can be considered as effective 
from the mathematical sense presented next, but unfortunately 
they appear as conceptually defective and disputable, see 
discussions in [22]. That is why, a better effective measure of 
uncertainty (MoU), i.e. generalized entropy of BBAs has been 
developed and presented in [11]. The mathematical definition 
of this new effective entropy is given by 


(10) 


s(X) & —m(X)(1 — u(X)) log(m(X)) 


+u(X)(1—m(X)). (11) 


s(X) is the uncertainty contribution related to X named the 
entropiece of X. This entropiece s(X) involves m(X) and 
the imprecision u(X) = Pl(X) — Bel(X) about the unknown 
probability of X in a subtle interwoven manner. Because 
u(X) € [0,1] and m(X) € [0,1] one has s(X) > 0, 
and U(m) > 0. The quantity U(m) is expressed in nats 
because we use the natural logarithm. U(m) can be expressed 
in bits by dividing the U(m) value in nats by log(2) = 
0.69314718.... This measure of uncertainty U(m) is a con- 
tinuous function in its basic belief mass arguments because it 
is a summation of continuous functions. In formula (11), we 
always take m(X) log(m(X)) = 0 when m(X) = 0 because 
lim,,(x)+0+ m(X) log(m(X)) = 0. Note that for any BBA 
m, one has s()) = 0 because m() = 0 and u() = 0. For 
the vacuous BBA, one has s(©) = 0 because m,,(O) = 1 and 
u(O) = 0. 

This measure of uncertainty U(m) is effective because it 
can be proved (see proofs in [11]) that it satisfies the following 
four essential properties: 

1) U(m) = 0 for any BBA m(-) focused on a singleton X 

of 2°, 

2) U(m®) < U(m®) if |O| < |0’. 

3) U(m) = —Vixee MX) log(m(X)) if the BBA m(-) 
is a Bayesian BBA. Hence, U(m) reduces to Shannon 
entropy [1] in this case. 

U(m) < U(m,) for any non-vacuous BBA m/(-) and 
for the vacuous BBA m,,(-) defined with respect to the 
same FoD. 


4 


Ya 


The maximum of entropy value is obtained for the vacuous 
BBA m, over a FoD 0, because ™, characterizes a source 
of evidence with a full lack of information. This maximum 
entropy value is U(m®) = 2!°! — 2 (see derivation in [11]) 
and it represents the sum of all imprecisions of P(X) for all 
X € 2°. Because for all X € 2° \ {0,©} one has u(X) = 1 
because [Bel(X), Pl(X)] = [0,1], and one has u(@) = 0 and 
u(Q) = 0 when considering the vacuous BBA then the sum 
of all imprecisions u(X) about P(X) is equal to 2'°! — 2. It 
is worth mentioning that one has always U(m®) > log(|®]) 
which means that the vacuous BBA has always an entropy 
greater than the maximum of Shannon entropy log(|O|) ob- 
tained with the uniform pmf on O. 
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V. CROSS-ENTROPY OF TWO BBAS 
A. Cross-entropy derived from Deng’s entropy 


Very recently in [12], Gao et al. proposed a definition of 
the cross-entropy of two BBAs inspired by the non-effective 
Deng’s entropy Ea(m) proposed earlier by Deng in [13] and 
defined as follows: 


Balm) = — > m(X) toe A) 


T (12) 
XCO 


where m(X) is the mass of belief of any subset X of the 
frame of discernment ©, and where |X| is the cardinality of 
X. If m(X) = 0, the term m(X) log( a+) is set to zero. 
Deng’s entropy definition is unfortunately not recommended 
because it is non-effective. Indeed, we can have Ey(m) > 
Ea(m,) indicating that a non-vacuous BBA m(.) can be more 
uncertain than the vacuous BBA m.,(.), which obviously is not 
appropriate because the vacuous BBA characterizes the state of 
total ignorance. As a simple counterexample of Deng’s entropy 
consider 0 = { A, B, C} the vacuous BBA m,,(.) with m,(AU 
BUC) = 1, and the non-vacuous BBA m(.) with m(AUB) = 
m(AUC) = m(BUC) = 1/3. Clearly, one gets Eg(m) > 
Ea(m,). See the paper [22] for more discussions about other 
non-effective entropy proposals. For this counterexample, the 
values of Deng’s entropies are 


m,(AU BUC) 


Ea(mv) — —m,(A UBU C) log(—jauBucT— 7? 


=-1-. i = — log (=) w 1.9459, 


23 — 1 
m(AUB 

Ea(m) = —m(AU B) log MAGS) 

m(AUC) 

— m(AU C) log( sce _ 7) 

m(BUC) 

—m(BUC) log ( sTe0e — 7 
ee ee ee econ 
= gg Beate ee ee 


Based on this non-effective entropy measure, the cross- 
entropy defined by Gao et al. [12] between BBAs m, and m2 
is based on a mimicry of the classical cross-entropy definition 
using Deng’s entropy, that is 

m2 xX ) 


C(m1,m2) = — »s. mi(X) log eee) 
xXCO 


Similarly, the cross-entropy between mz and my, is 


C(m2,m1) = — > m2(X) log EO» 
XCO 


Because Deng’s entropy is non effective, we have serious 
doubt on the validity of the cross-entropy concept defined 
by C(mi,mz) and C(mz2,m1) formulas. This matter of fact 
justifies the necessity of using a better entropy measure [11] 
defined by (10)-(11), and the development of a better cross- 
entropy measure. This is what we present in the next section. 


B. A new definition of cross-entropy 


Based on the definition (3) of cross-entropy in the proba- 
bilistic framework, and the definition of the effective general- 
ized entropy U(m) given in (10), it seems quite natural to try 
to extend directly the concept of cross-entropy of two pdfs p 
and q to the cross-entropy of two BBAs m, and mz defined 
over the same FoD ©. The extension of the classical cross- 
entropy formula (3) applied with generalized entropy U(m) 
given in (10) suggests directly the following generic formula 
of the cross-entropy between two BBAs 


U(mi,mz2) = >; $1,2(X) (13) 
with 
s1,2(X) = —mi(X)(1 — ui(X)) log(ma(X)) 
+ uj(X)(1— mi (X)) (14) 


where indexes 7, 7 and & have to belong to the set {1,2}. 
From this generic formulation, one sees that we could a 

priori define eight different cross-entropies between two BBAs 

depending on the choice of indexes (i,j, &) listed in Table I. 


Table I 
POSSIBLE TRIPLETS (i, j, k). 


Triplet T’ = (2, 7,k 
Ty 


It is worth mentioning that if m2 =m, the cross- 
entropy measure coincides with the entropy measure, that is 
U(m1,m2) = U(m,m1) = U(m)). 

What is the best definition of the cross-entropy of two 
BBAs among the eight possible definitions? Or equivalently, 
what is the most suitable triplet of indexes (7,7,k) to plug 
in the generic cross-entropy formula (14)? To answer to this 
important question, we propose to consider as the effective 
choice of triplet (i, 7, &) the one which allows the information 
entropy of a BBA m™, to be less than or equal to its cross- 
entropy with any other BBA mg. More precisely, select the 
triplet (7, 7,&) such that for any BBAs m, and mz defined on 
the same FoD, the following inequality holds 


U(m1,mz2) > U(m1). (15) 


Actually for the eight a priori possible definitions of cross- 
entropy drawn from (13)-(14), one can easily find by Monte- 
Carlo simulations of random pairs (m1,m2) of BBAs that 
the choices of triplets (1,1,1), (1,2,1), (1,2,2), (2,1,1), 
(2,1,2), (2,2,1), and (2,2,2) are not judicious because the 
inequality (15) can be violated, see some examples in the 
appendix. Because our Monte-Carlo analysis based on 100000 
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random pairs (7™m,,mz2) of BBAs revealed that the inequality 
(15) was satisfied only for the triplet (i,j,k) = (1,1, 2) for 
different cardinalities of frames of discernment tested up to 
|O| = 10, we did conjecture that the satisfactory definition 
of a cross-entropy of two BBAs satisfying inequality (15) is 
mathematically defined by (13) with 


61,2(X) = —mi(X)(1 — ui(X)) log(ma(X)) 


+ui(X)(1—m2(X)). (16) 


The term s,2(X) defined in (16) is called the cross- 
entropiece of X. 


Theorem 1: Let m; and m2 be BBAs defined on the same 
frame of discernment. The cross-entropy U(m,, m2) defined 
by (13) and (16) always satisfies the inequality U(m1, m2) > 
U(myz), with equality only if m1 = m2. 


Proof: see appendix. 


Proposition: If the BBAs m, and mz are Bayesian the cross- 
entropy defined by (13) and (16) coincides with the classical 
cross-entropy given by (3). 


Proof: Since ui(X)=0 for all X € 2° for any Bayesian 
BBA m(.), the proposition is immediate. 


VI. RELATIVE ENTROPY OF TWO BBAS 


It is worth mentioning that the inequality (15) is a gen- 
eralization of the well-known Gibbs inequality (2), and it 
coincides with Gibbs inequality when the BBAs m, and m2 
are Bayesian BBAs. The generalized relative entropy (GRE) 
of two BBAs m, and mz that are defined over the same frame 
of discernment © is naturally defined by 


(17) 


Because Theorem | holds, one has always U(my || m2) > 
0, with equality if m4 equals mz. As for the classical relative 
entropy defined by (4), the GRE is not symmetric under the 
interchange of the BBAs mj, and mg, so that in general 
U(m, || m2) 4 U(mz || m1). Therefore GRE must also not 
be considered as a distance. This GRE is a direct general- 
ization of Kullback-Leibler (KL) divergence measure in the 
framework of belief functions. Using expressions (16) and° 
(11) the mathematical definition of U(m, || m2) is 


U(ma || m2) = $7 [rma (X)(1 — wi (X)) 


xXCO 
 (log(ma(X)) — log(ma(X))) 
+ ur(X)(m1(X) — ma(X))]. (18) 


GRE coincides with K L-divergence formula (4) when the 
BBAs ™, and m2 are Bayesian because if focal elements of 
my and mz are singletons of 2° then u;(X) = 0 and 


Swith m replaced by m1. 


U(my || m2) = SY mi(X)(log(m1(X)) — log(ma(X))) 


(19) 


which is equivalent to formula (4) when interpreting the 
bayesian BBA ™ as a probability measure p, and the bayesian 
BBA mz as a probability measure q over the set O. 


VII. EXAMPLE OF APPLICATION 


In this section we present an example of the use of the 
entropy, cross-entropy and relative entropy concepts defined 
in this paper for the purpose of decision-making under uncer- 
tainty. More precisely, given a BBA m/(.) defined over a FoD 
©, how to make a decision based on m(.) and how the select 
the most pertinent element 6; of O? 


A. Decision using relative entropy 


Classically the decision-making from a BBA is based on 
the max of Pl(.), on the max of Bel(.), or on the max of 
pignistic probability depending on the attitude chosen by the 
decision-maker (resp. optimistic, pessimistic or in-between 
attitudes). Here we propose to make the decision based on the 
relative entropy measure. More precisely, from any BBA m 
defined over a FoD 0 = {6;,i = 1,...,n}, we calculate the 
divergences U(m, || m) for i = 1,2,...,n, where m, is the 
BBA focused on the element 6; € © such that m,;(6;) = 1. 
We will take as decision 6 the element 6; for which the 
divergence between m and m,; is minimal, that is 6 = Oj« 
with * = arg minget1,....n} U(mi || m). 


Example: Consider the FoD 0 = {61, 02,03}, and after some 
fusion processing suppose we obtain the following BBA 
m/(.) defined by m(0,) = 0.1, m(@2) = 0.2, m(63) = 0.3, 
m6, U 02) = 0.01, m(O, U 3) = 0.02; m/(O2 U 63) = 0.07 
and m(0, U 02 U 63) = 0.3. Then we get U(m, || m) * 2.30, 
U(mg || m) ¥ 1.60 and U(mz || m) + 1.20. Based on this 
result the decision will be 6 = 83 because the divergence 
U(mzg || m) = 1.20 is the least value among the values 2.30, 
1.60 and 1.20. This decision is consistent with what we 
intuitively expect because [Bel(0,), P1(01)| = [0.10, 0.43], 
[Bel(02), Pl(02)| = [0.20,0.58] and [Bel(03), Pl(@3)| = 
(0.30, 0.69] showing that 03 is the element of © that has the 
maximum of belief and also the maximum of plausibility. 


Remark 1: We could not use U(m || m,) instead of 
U(m; || m). Indeed, we get U(m || m;) = +00, and thus 
cannot decide. But the use of U(m, || m) is how- 
ever not completely satisfactory because for m; we 
have u;j(X)=0 for all X € 2° and U(m;) =0, so that 
U(m;, || m) = U(mi,m) — U(m;) = — log(m(6;)). Thus, the 
decision is made with only part of the information of m 
about 9; and not with the other mass values of non-singleton 
focal elements of m (if any). Subsequently, a pseudo-distance 
inspired by Jensen-Shannon is proposed which uses the whole 
BBA information. 
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B. Decision using Jensen-Shannon pseudo-distance 


In [23] we did propose a decision-making method based on 
the minimum of belief-interval distance that used Wasserstein 
distance. We take for decision 6 the element 6; for which 
the distance d(m,m,;) between m and m, is minimal, that is 
6 = Oj» with i* = argminjess9,.n} d(m, mj). This method 
implicitly assumes the uniform distribution of the probability 
P(X) in [Bel(X), PI(X)] which is disputable because we 
cannot check in practice if this assumption is true, or not. 
To circumvent this problem, we propose to replace the belief- 
interval distance between BBAs by the Jensen-Shannon-alike 
pseudo-distance derived from our relative entropy concept, 
which would be defined by 


m +m! 


I. 


m+m/’ 
2 


d(m,m') & 4 5[U(m | )+U(m' | (20) 


Note that d(m, m’) coincides with Jensen-Shannon distance 
(6) when the BBAs m and m’ are bayesian BBAs. One has 
also d(m,m’) = d(m’,m), d(m,m‘) > 0 and d(m,m’) = 0 
when m=m’! because U(m || 4") =U(m || m), and 
U(m || m) = U(m, m) — U(m) = 0. 

In our example, we obtain the following pseudo-distances: 
d(m,m,) * 0.67, d(m,mz2) 0.60, and d(m,ms3) % 0.55. 
Based on these values we will take the decision 6 = 03. 

Note also that if m is the vacuous BBA (ze. 
m=MmMy,), then in this particular case we will obtain 
d(my,m1) = d(my, m2) = d(my, m3) = 0.6656 so that no 
clear decision can be drawn from the vacuous BBA since 
it does not contain useful information, which makes perfect 
sense. Note that the inequality (d(m,m,) = 0.6763) > 
(d(my,m1) = 0.6656) is not surprising because the BBA m 
is more unfavorable to 6, than the vacuous BBA m, is. 


Remark 2: We tested (20) against the triangular inequality 
d(m,m') + d(m',m"’) > d(m,m"). A crude Monte Carlo 
analysis based on millions of random BBAs generated uni- 
formly over different frames of discernment up to cardinality 
|O| = 13 revealed no counterexample. This indicates that such 
counterexamples are rare events. However, we tried a refined 
Monte Carlo analysis, where the set of focal elements were 
generated prior to the BBA. On the basis of 10000 different 
generated BBAs and near 500- 10° combination cases, we 
have found a rate of 2 - 10~° counterexamples to the triangular 
inequality. This is quite small. More interestingly, the degree 
of violation of the triangular inequality was small, since we 
found 1.17 as the maximum value for es That 
is why we consider d(m,m’) only as a pseudo-distance, i.e. a 
semimetric. But our simulations suggest that this semimetric 
satisfies a sharp p-relaxed triangle inequality: 


d(m,m"") < p(d(m,m') + d(m’,m")) with p> 1.2. 


In conclusion, the topology induced by this semimetric is 
certainly very close to a true metric topology. 


Counterexample of triangular inequality: 


Consider 0 = {6;, 62,03} and the three BBAs m, m’ and 
mm as follows: 


m(@3) = 0.25, m(01U63) = 0.19, m(02U63) = 0.21, m(Q) = 0.35, 
m’ (0, U03) = 0.25, m' (82 U 03) = 0.26, m'(@) = 0.49, 
m (01 U 02) = 0.44, m""(@) = 0.56. 


We get d(m,m’) ®& 0.1144, d(m’,m’) & 0.1800 and 
d(m,m") = 0.3306. Hence d(m,m’) + d(m’,m’’) = 0.2945 
which is smaller than d(m,m”) = 0.3306. So there, the 
triangular inequality d(m,m’) + d(m’,m”) > d(m,m’’) is 
violated. 


VIII. CONCLUSION 


In this paper we have proposed new measures of cross- 
entropy and relative entropy of two basic belief assignments 
based on the new effective measure of entropy of belief func- 
tion presented in 2022. These new concepts are mathematically 
well-defined and are direct generalizations of their classical 
formulations drawn of the probabilistic framework. It is ex- 
pected that these new theoretical concepts will become useful 
in some applications for decision-making under uncertainty. 
As research perspectives, we hope to improve them a bit more 
in order to provide a true Jensen-Shannon metric for belief 
functions in a near future. Also, applications of these new 
concepts are under development and they will be reported in 
future publications. 


APPENDIX 


A. Counterexamples of inequality (15) 


We consider the FoD © = {A, B,C} and we give BBAs’ 
my,(.) and m2(.) such that inequality (15) is violated for the 
different choices of triplet (7, j,k) used in the formula (14). 


e Consider (i,j,k) = (1,1,1) and the BBAs of Table II. 
We get U(m1) = 3.9742 and U(m1,m2) = 3.9432. The 
inequality (15) is violated because U(m1) > U(m1, m2). 

e Consider (7, 7,k) = (1,2,1) and the BBAs of Table III. 
We get U(m1) = 3.7447 and U(m1,m2) = 2.4995. The 
inequality (15) is violated because U(m 1) > U(my1, m2). 

e Consider (7, j,k) = (1,2,2) and the BBAs of Table IV. 
We get U(m1) = 3.9568 and U(m,, m2) = 2.5086. The 
inequality (15) is violated because U(m 1) > U(my1, m2). 

e Consider (i,j,k) = (2,1,1) and the BBAs of Table V. 
We get U(m1) = 3.2115 and U(m,, m2) = 2.8616. The 
inequality (15) is violated because U(m1) > U(my1, m2). 

e Consider (7, j,k) = (2,1,2) and the BBAs of Table VI. 

We get U(m1) = 2.5542 and U(m,, m2) = 2.2147. The 

inequality (15) is violated because U(m1) > U(m1, m2). 

Consider (i, j,k) = (2,2,1) and the BBAs of Table VII. 

We get U(m1) = 4.5243 and U(m1, m2) = 3.8714.The 

inequality (15) is violated because U(m1) > U(my1, m2). 


7The numerical values entering in the tables have been approximated to 
their fourth decimal for convenience. 
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e Consider (i, j,k) = (2,2,2) and the BBAs of Table VIII. Table VI 
We get U(m1) = 3.8858 and U(m1, mz) = 3.0406. The BBAS mi(.) AND ma(.). 
inequality (15) is violated because U(m 1) > U(m 1, m2). 


Table II 
BBAS m1(.) AND ma(.). 


0.2094 
0.0537 
0.3016 


0.0054 
0.0713 A Table VII 
0.0712 ; BBAS m1(.) AND ma(.). 


0.2874 
Focal Elem. | mai(.) | ma(.) 
A 


Table II 
BBAS m1(.) AND ma(.). 


A 14 ; 


Table VIII 
BBAS m1(.) AND ma(.). 


Focal Elem: 
A 0.0164 | 0.2! 


Table IV 
BBAS m1(.) AND ma(.). 


Focal Elem: 
A 0.1585 i 


Proof: Let F, Cc 2° \ {0} be the set of focal elements of 
my. First at all, it is noticed that ui(X) <1 for all X € Fi. 
Moreover, if there are X € Fy, and m € M(O) such that 
m(X) = 0, then U(m1,m) = +00. As a consequence, if m 
minimizes U(m,,m), then its set of focal elements contains 


Table V 
BBAS m1i(.) AND m2(.). the set of focal elements of ™m,. 
Optimizations. Let F be such that F, C F Cc 2° \ {0}. The 
Focal Elem. ; : 
proof is done by solving: 
0.1045 4 
0.1721 5 j 21 
0.1721 
0.2078 | 0. under constraint 
0.0388 
S- m(X)=1 (22) 
X€F 
B. Proof of the Theorem 1 
Subsequently, log is the natural logarithm function to the Where 
base of the mathematical Euler constant e. To prove the 
Theorem 1, we first prove the theorem 2 below. =a x an (XU — mil X)) 
Theorem 2: Let (OQ) be the set of basic belief assignments O4AX#O 
over Q. Then: + S> -mi(X)(1 = u1(X)) log(m(X)). (23) 
X€F, 


arg a U(mi,m) = {mi}. 
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It is worth noting that m(X) is nothing but the component 
of index X of the unknown map vector m : F' — R*. The 
optimization (21) with equality constraint (22) could typically 
be solved by means of Lagrangian multiplier method. 


Because log(m(X)) is a concave function of m(X), the 
term —mj,(X)(1 — ui(X))log(m(X)) is proportional to 
— log(m(X)) and is a convex function of m(X). And because 
u1(X)(1 — m(X)) is a linear function of m(X), the term 
=r (X)(1 — w1(X)) log(m(X)) + ui(X)(1 — m(X)) is 
a convex function of m(X). Therefore, the function f(m) 
is a convex function. We are then ensured that Lagrangian 
multiplier condition will point, if it is fulfilled, to the minima 
of the function. 


Lagrangian multiplier is defined for this problem by: 


L(m,) = $7 —mi(X)(1 — ui (X)) log(m(X)) 


+ S2 w(X)(1—m(X))+AN— SO m(X)]. (24) 
04X40 X€F 


The optimality conditions are: 
DiyxyL(m, A) =0 forall X € F. 


Where D,,(x)L(m,) is the differential of L(m,.) with 
respect to m(X) given by 


=mi(X)(1 — ui (X)) 


Dix) L(m, A) = m(X) 


—u1(X) —d, 


for X © F\, and: 


Dmcx)L(m, ) = —ui(X) —2, for X € F\ Fy. 


Then, the optimal solution for (21) is m®" such that: 


mi(X)( = wi(X)) 


opt = 
m"(X) = may forall X € Fi, (25) 
with chosen such that: 
—X=u1(X) for all X EF \ Fi, (26) 


(—A — u1(X)) 


X€F\F, X€F, 


Noticed that (25) implies —A — uy(X) > 0 for all X € Fi. 
Case F 4 F\: Condition (26) implies —\ < 1 and then: 


yee Sy met 


XEF, (A — wi (X)) XEF, 


IV 


Then by (27), it comes m°P'(X) = 0 for X € F'\ F, which 
contradicts hypothesis that F’ is the set of focal elements of 
m. There is no solution with more focal elements than m4. 


Case Ff = F,: Choice \ = —1 is obvious. Therefore, the 


opt 


unique minimizer m°! = my is obtained. 


Conclusion. It has been shown that minimizer of U(m1,m) 
only exists if it has the same set of focal elements than ™ 1. 
Moreover, it is shown in that case that the only minimizer is 
my. As a consequence: 


arg min 


ee U(mi,m) = {m1} for all m € M(O). 


Because Theorem 2 holds, we have U(m,,m) > U(mj) 
when m 4 mi, and U(m,,m ,) = U(m1) when m = mj. 
Therefore, U(m1,m) > U(m,) for any BBA m € M(O). 
Thus the inequality (15) holds, with equality only ifm, = ma, 
which completes the proof of the Theorem 1. 
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On Monotonicity Desideratum for an Efficient 
Entropy Measure of Basic Belief Assignments 
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Abstract—In this short paper we discuss the monotonicity 
desideratum for defining an efficient entropy measure of basic 
belief assignments. We browse some alternatives of the effective 
entropy measure developed recently, and we show that all these 
new alternatives for an entropy measure are not efficient. Only 
one appears to be quasi-efficient for the frame of discernment of 
dimension 2. 


Keywords: entropy, cross-entropy, belief functions. 


I. INTRODUCTION 


We assume that the formula of the entropy for basic belief 
assignment (BBA) is of the general form, see [1] for details 


d= 3(X) (1) 


s(X) = —a(u(X))m(X) log(m(X)) 
+ B(u(X),m(X)) (2) 


where u(X) = PI(X)— Bel(X), a(u(X)) € [0,1] is a 
weighting function of the surprisal —log(m(X)), and 
G(u(X),m(X)) is a function that must increase the entropy 
U(m) as soon as there is some imprecision on unknown 
probability P(X) of X. s(X) has been called the entropiece 
of subset X. 


In [1], the function U(m) has been defined as effective if 
it satisfies the four natural following desiderata: 


D1: For any non-empty frame of discernment © and for any 
BBA m(-) focused on a singleton X of 2° one must have 


U(m) =0 (3) 


D2: The measure of uncertainty of a total ignorant source of 
evidence must increase with the cardinality of the frame of 
discernment. That is 


/ 


U(m?) <U(m?), if |O| < |0". (4) 


D3: The measure of uncertainty U(m) must coincide with 
Shannon entropy [2]-[4] if the BBA m/(-) is a Bayesian 
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BBA. This desideratum is mathematically expressed for any 
Bayesian BBA m(-) defined on the FoD © by the condition! 


U(m) = — S) m(X)log(m(X)) (5) 


XEO 


D4: For any non-vacuous BBA m/(-) and for the vacuous BBA 
my(-) defined with respect to the same FoD one must have 


U(m) < U(my) (6) 


It has been proved in [1] that U(m) is effective in particular 
if one takes 


a(u(X)) =1— u(X) (7) 
B(u(X),m(X)) = u(X)(L — m(X)) (8) 


There is unfortunately no unicity for the choice of a(u(X)) 
and B(u(X),m(X)) functions, even if this particular choice 
has a quite simple interpretation. The interest of this choice is 
that it allows to define easily the cross-entropy U(m,, mz) of 
BBAs in a simple way satisfying the Gibbs-alike inequality 
[5] U(mi,m2) > U(m,). However this effective entropy 
formula is not entirely satisfying because the triangular 
inequality for Jensen-Shannon-alike distance can be violated 
in rare situations for some distributions of BBAs. This matter 
of fact motivates us to search for improved effective entropy 
formulas. 


For this, we would like that the entropy satisfies a 5th 
desideratum D5 about the monotonicity of U(m). More pre- 
cisely, we want that a reduction of mass of X C Y transferred 
to its superset Y increases the entropy value, and we want 
that any reduction of mass of m(Y) transferred to one of its 
subset X decreases the entropy value. As a simple example, 
for © = {A,B} if we consider the BBAs m(.) and m-,(.) 
defined by 


m(0) = 0 me(0) = 0 

m(A) =a oat me(A) =e-a 

m(B)=b m(B) =b 

m(AU B)=1-—a—b m(AUB)=1-—(e-a)—b 


with 0 <e<1. We would like to have U(m) < U(m.) 
because the BBA m, is less specific than U(m) because the 


‘Shannon entropy [2] is given here in nats, and we take Olog(0) = 0 
because lim,,_,9+ 2 log(a) = 0 which is proved using L’H6pital’s rule. 
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mass of ambiguity (or disjunction) AU B for m, is bigger 
than for the BBA m. 


So we would like that the entropy of a BBA satisfies the 
extra desiderata D5 (i.e. the monotonicity desideratum) stated 
as follows: 


D5 (monotonicity): U(m) must increase when a mass of a 
proposition Y increases while the mass of one of its subset 
X CY decreases by the same amount, and vice versa. 


Note that D5 is equivalent to the mathematical condition 


oU(m) | OU(m) 
Om(X) ~ Om(Y) 


for all X and Y such that m(X) >0 and X CY. 


It is worth mentioning that if U(m) satisfies D5 then it 
will satisfy D4, but the converse is unfortunately false for the 
entropy proposed in [1] because it can be easily shown based 
on a simple counter-example that the U(m) based on (7) and 
(8) (which satisfies D4) can occasionally violate D5. 


Example: Consider 0 = {A, B}, « = 0.5 and the BBAs 


m(0) =0 Me=0.5(0) = 0 

m/(A) = 0.0067 a Me=0.5(A) = 0.00335 
m(B) = 0.8645 Me=0.5(B) = 0.8645 
m(AU B) = 0.1288 Me=0.5(A U B) = 0.13215 


We get U(m) = 0.548244475651207 and U(m.) = 
0.542869517947531, and we observe that U(m) > U(m.). 
This proves that D5 is not satisfied in this counter-example 
with the effective entropy measure defined in [1] by the 
formulas (1) and (2). 


A measure of uncertainty U(m) that will satisfy desiderata 
D1, D2, D3 and D5 (and thus D4 too) will be named an 
efficient entropy. 


II. ATTEMPTS FOR IMPROVING ENTROPY FORMULATION 


In the spirit of original effective entropy formula defined 
in [1] by the formulas (1) and (2), we did explore other 
possible formulas by slightly changing the a(u(X)) and 
B(u(X),m(X)) functions involved in the derivation of the 
entropieces s(X). The Monte-Carlo evaluation of these mod- 
ifications with respect to the satisfaction of desiderata D4 and 
D5 are reported in the next section. For D5, we did only make 
the evaluation based on the example presented before when 
considering m and m, defined only over 2° = {A, B}. 

To keep the spirit of principle of entropy we choose positive 
functions a(u(X)) such that a(u(X)) = 0 for u(X) = 1, 
and a(u(X)) = 1 for u(X) = 0, so that the first part 
of entropy formula (1) remains compatible with Shannon 
entropy for Bayesian BBA. We also choose positive functions 
B(u(X), m(X)) having same behavior as u(X)(1 — m(X)) 
at the limits when u(X) equals zero or one, and when m(X) 
equals zero or one. 


We make a behavior analysis of U(m) with the 
change of the functions a(u(X)) and ((u(X),m(X)). 
We refer a particular choice of couple of functions 
(a(u(X)), B(u(X),m(X))) by a version number vj. 
The original version vg corresponds to the choice used in 
effective definition presented in [1], ie. a(u(X)) = 1—u(X) 
and 8(u(X),m(X)) = u(X)(1 —m(X)). 


First, we analyze the behavior of U(m) with respect to the 
change of a(u(X)) function as those tested in Table I. 


C169) NN ILO OZIE“9) 
Pv | i-wtx) Pw 


1—-m(xX 


eV mR) 
l-u(x 

VO) 
I-—u(X 

eS 


) 

l-u(X ) 
1=(u(X)/|X1) 

d-u(X ) 

) 


1—(u(X)/2) 
1—u(X) 
1-(1— 7) U(X) 


Table I 
FUNCTIONS ANALYZED. 


The functions a(u(X)) used in versions vj, v2 and v3 
under-amplify the discounting because these functions are 
under the line (1 — u(X)). The functions a(u(X)) used in 
versions v4, V5 and vg over-amplify the discounting because 
these functions are above the line (1 — u(X)). 


Secondly, we analyze the behavior of U(m) with re- 
spect to the change of the u(X) function appearing in 
B(u(X),m(X))) of version vo by those tested in Table II. 


7764 
(1 yarxyxp) A — m(X) 
1l-u 


(1 = oy) (1 - m(X)) 


l-u(X 
= ga - m0) 
l-u 


tet) (1 — m(X)) 


ist) (1 = m(X)) 
oes) (1 = m(X)) 


1 
{= 


Table II 
FUNCTIONS ANALYZED. 


The functions in v7, vg and v9 _ under-amplify 
u(X)(1—m/(X)) because they are below the line 
u(X)(1 — m(X)), whereas the functions in vio, vii 
and vj2 over-amplify u(X)(1 — m(X)) because they are 
above the line u(X)(1 — m(X)). 


Thirdly, we analyze the behavior of U(m) with respect 
to the change of the linear term 1 — m(X) entering in 
B(u(X),m(X)) formula of the original entropy version vo, 
and we consider the functions listed in the Table III. 
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XT uC = mCX)) 


1—-m(X 


1—-(m(X)/| Xx 
1-—m(X 
1—(m(X) /2 
I—m(X 


A) | OO ane) 


u(X) 
u(X) 


Table II 
FUNCTIONS ANALYZED. 


II]. SIMULATION RESULTS 


The following tables presents the results of the tests of 
satisfaction of D4 and D5 based on Nayo = 500000 random 
generations of BBAs used in our Monte-Carlo simulation. 
(D4) is the number of cases where U(m) > U(m,) has 
occurred, i.e. the number of violations of D4 for the case 
where |O|=5, and where m, is the vacuous BBA for 
which m,(©) = 1. N.(D5) is the number of cases where 
U(m) > U(m-) has occurred, i.e. the number of violations of 
D5 when working with the frame of discernment 0 = {A, B}. 


2 


Table IV gives the results corresponding to functions listed 
in Ta 


ble I. 
(N(D4), U(m~)) T No.9(D5) T No.s(D5) T No.2(D5) 
0, 30) 15380 20248 26798 

V1 

v2 

v3 

4 

5 

6 


[oui 0, 30) 6048 7715 TOIT 
[vo | 0, 30) 6040 7699 10365 
[vs 0, 30) 5851 7716 10474 


(0, 30) 34488 45352 63153 
0, 30) 25667 33673 45652 
0, 30) 15294 T9931 26900 


Table IV 
NUMBERS OF FAILURES AMONG Nuc = 500000 RANDOM RUNS. 


We observe from the results of Table IV that all expressions 
of U(m) satisfy D4, and more we discount m(A) (i.e. closer 
to zero is the ¢-factor) more failures we get for the D5 test. 
We also observe that using an under-amplifying a(u(X)) 
function as in versions v1, V2 and v3 reduces notably (by 
almost a factor 3) the number of failures for the D5 test with 
respect to the original linear weighting function 1—u(X) used 
in original entropy formula of vg. We observe conversely that 
using over-amplifying a(u(X)) function as in versions v4, v5 
and v6 increase drastically the number of failures of the D5 test 
with respect to the original linear weighting function 1—u(X) 
used in original entropy formula of vo. Therefore, the choice 
for an under-amplifying a(u(X)) function is recommended. 


Table V gives the results corresponding to functions listed 
in Table II. 

As we observe in Table V, replacing u(X) by an under- 
amplifying function as in versions v7, vg, and vg has an 
impact on number of failures of DS test but it changes 
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N(D4),U(mv)) | No.9(D5) | No.5(D5 
Vo (0, 30) 15380 20248 26798 

1452 1899 2585 

0, 60) 3818 4954 6528 

0,75) 15542 20067 27014 

0, 30) 12653 T6280 22376 

0, 30) 12617 16396 22230 
Vi 0, 30) 12603 15999 22090 


= 


Table V 
NUMBERS OF FAILURES AMONG Nuc = 500000 RANDOM RUNS. 


substantially the entropy of vacuous BBA. If we replace 
(X) by an over-amplifying function as in versions vio, 
Vi1, and vig we do not change the entropy of vacuous 
BBA which remains equal to U(m,) alSl — 2, and 
we reduce moderately the number of failures of D5 test 
and all results with versions V19, Vi1, and V2 are very similar. 


S 


Table VI gives the results corresponding to functions listed 
in Table III. 


N(D4), Uline No.5(Dd 
0, 30) 15380 20248 26798 
0, 30) 10574 13614 18423 
0, 30) 10645 13857 T8521 
0, 30) T0506 13645 18578 
0, 30) 12496 16401 22062 
0, 30) 16226 20251 28595 
Vis 0, 30) 15513 19848 26816 


Table VI 
NUMBERS OF FAILURES AMONG Nuc = 500000 RANDOM RUNS. 


We observe that using an under-amplifying function of 
1 — m(X) as with versions vi3, via, and vi5 reduces by 
about 1/3 the number of D5 failures, and there is no so much 
differences between results of these three versions. One sees 
that using over-amplifying functions as those in versions 
Vie, Vi7, and vig does nor reduce substantially the number 
of failures of D5 test, and it can even be slightly worse than 
result of Vo as we can see when using functions of v17. 


7 


Our analysis of the behavior of U(m) done with all func- 
tions tested in the Tables I, II and III reveals that the most 
important effect on the reduction of the number of failures of 
D5 desideratum is obtained when using an under-amplifying 
function a(u(X)). 


In order to reduce a bit more the number of failures of D5, 
we combine the expressions of a(u(X)) and 6(u(X),m(X)) 
that provide the minimum of failures to obtain the most failure 
reduction of D5 test. Hence if we consider the expression of 
(u(X)) in vo, and the expression of B(u(X),m(X)) in vis, 


Q 
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we get the following entropiece expression (denoted v9) 


1—u(X) 


ST) 


m(X) log(m(X)) 


1—m(X) 
+ 6 COTT GatxyIxD 


Using this entropiece formulation, named vi9 Vi9, we get 
the results of Table VII. 


N(D4), U(mv)) | No.9(P5) | No.5(D5) | No.2(D5) 
y T3380 20248 26798 
vig 4180 5424 7250 


Table VII 
NUMBERS OF FAILURES AMONG Nuc = 500000 RANDOM RUNS. 


(9) 


We see that using this new expression vig for entropiece, 
we drastically reduce the number of failures of D5 to ap- 
proximately (5000/500000)- 100 = 1%. Even if this failure 
percentage is quite small, it is not equal to zero. Therefore, 
the entropy measure based on this version vig for entropiece 
definition is not totally efficient, but almost efficient for the 
case with |O| = 2. Of course a more deeper and general 
analysis of test D5 with random BBA generated with frames 
of discernment O having more than two elements should 
be tested to see if a similar small percentage of failure are 
obtained with the entropiece defined in the version vig, and in 
this case we could conjecture that this new entropy measure 
is quasi-efficient. 


IV. CONCLUSION 


The existence of an entropy formulation that would war- 
ranties the efficiency (i.e. DI, D2, D3 and D5 desiderata) 
of the entropy measure of any BBA remains an open very 
challenging problem. We hope that a theoretical solution of 
this interesting problem exists, and if so the proof of its unicity 
(if any) would also be very welcome. 
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Abstract—This paper analyzes the different definitions of a 
negator of a probability mass function (pmf) and a basic belief 
assignment (BBA) available in the literature. To overcome their 
limitations we propose an involutory negator of BBA, and we 
present a new indirect information fusion method based on this 
negator which can simplify the conflict management problem. 
The direct and indirect information fusion strategies are analyzed 
for three interesting examples of fusion of two BBAs. We also 
propose two methods for using the whole available information 
(the original BBAs and their negators) for decision-making 
support. The first method is based on the combination of the 
direct and indirect fusion strategies, and the second method 
selects the most reasonable fusion strategy to apply (direct, or 
indirect) based on the maximum entropy principle. 


Keywords: belief functions, BBA negator, information fusion, 
measure of uncertainty, entropy. 


I. INTRODUCTION 


This paper is an extended version of our paper published in 
Cybernetics and Information Technologies (CIT) journal [1]. 
Due to page limit restrictions of the CIT journal, we were not 
able to provide all technical details and the examples, and that 
is why we propose this extended version for the readers and 
researchers interested in this topic. 

This paper deals with basic belief assignments (BBAs) in- 
troduced by Shafer in his mathematical theory of evidence [2] 
known also as Dempster-Shafer Theory (DST) in the literature. 
We focus on the construction of an involutory negator of a 
BBA, and its application for information fusion. The concept 
of the complement of a body of evidence (i.e. a negator) has 
been introduced by Dubois and Prade [3] in 1986, and re- 
examined by Yager in [4] who has attracted a new interest of 
the research community working with the belief functions. The 
main disadvantage with these negators (and of the most recent 
proposals) is that they are not involutory’ in general so that 
the information content of the negator of a negator of a BBA 
is not equal to the information content of the original BBA. 
This is problematic from the informational standpoint because 
we naturally expect that working with negator of negator 


lan involutory function (or involution) is a function f that is its own 
inverse, that is f(f(x)) = x for all x in the domain of f. This means that 
applying f twice produces the original value. 
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of evidence should be equivalent to working with original 
evidence. The problem we address in this paper can be stated 
as follows: let’s consider a frame of discernment (FoD) O of 
a problem under concern. Knowing a first expert providing a 
BBA m(-) defined on the power set? 2°, is it possible to find 
a second expert with a BBA m/(-) defined on the power set 
2° that expresses the opposite (or negation) assessment of the 
first expert? How can this be done effectively? Based on which 
principle and justifications? The second problem we address is 
the use of negator of BBAs for the information fusion and their 
possible advantages for decision-making support. It is worth 
mentioning that the negation of a BBA (i.e a BBA negator) 
must not be confused with negative values for masses of belief 
which are not allowed in Shafer’s mathematical theory of 
evidence. This work focuses on the search for an involutory 
negator of BBA which can be interpreted as a dual approach 
of the characterization of any source of evidence. 

This paper is organized as follows. Section II recalls the 
basic notions of Belief Functions (BF) and the entropy of 
BBAs. In Section III a detailed review and examples of several 
negators of probability mass function (pmf) and BBA proposed 
in the literature up to now is made. Section IV introduces 
a new involutory negator for BBAs. Section V recalls the 
principle of the classical direct fusion approach and describes 
the principle of a new indirect fusion approach based on 
the new involutory negator for BBA. In Section VI three 
interesting examples related to conflicting sources of evidences 
are described. The results obtained on the base of direct 
and indirect fusion approaches using Dempster’s rule and 
Proportional Conflict Redistribution rule No.6 of combination 
are analyzed. Section VII discusses two important remarks 
about the indirect fusion based on conjunctive rule and about 
the entropy change due to the use of BBA’s negator. Section 
VII is devoted to the management of direct and indirect 
fusions for decision-making support. Conclusion is done in 
Section IX. 


2By definition, 2° = {X|X C ©}, which is the set of all subsets of © 
(empty set @ and © included). We usually omit { and } characters for denoting 
the elements of the power-set because it makes the notation simpler and 
shorter. For instance, if the frame of discernment is © = {.A, B}, the power- 
set 2° will be denoted by {0, A, B, AUB} with our notation which uses only 
11 characters, instead of using the classical notation {0, {A}, {B}, {A, B}} 
which would require 17 characters. We can make the notation even more 
shorter by writing 2° as {0, A, B, ©} using only 9 characters. 
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II. BELIEF FUNCTIONS AND ENTROPY 


The belief functions were introduced by Shafer [2] for 
modeling epistemic uncertainty, for reasoning about uncer- 
tainty, and for combining distinct sources of evidence (SoEs). 
The answer of the problem under concern is assumed to 
belong to a known finite discrete frame of discernement (FoD) 
© = {01,...,4n} where all elements (i.e. members) of © are 
exhaustive and exclusive. The set of all subsets of © (including 
empty set (), and ©) is the power-set of © denoted by 2°. The 


number of elements (i.e. the cardinality) of the power-set is 
al, 


A. Basic definitions 


A normalized basic belief assignment? (BBA), associated 
with a given source of evidence is a mapping m®(-) : 2° > 
[0, 1] such that m°(0) = 0 and Yye9e m°(X) =1. A BBA 
m®(-) characterizes a source of evidence related with a FoD 
©. For notation shorthand, we can omit the superscript O in 
m®(-) notation if there is no ambiguity on the FoD we work 
with. The quantity m(X) is called the mass of belief of X. 
The element X € 2° is called a focal element (FE) of m(-) if 
m(X) > 0. The set of all focal elements of m/(-) is denoted* 
by Fo(m) = {X © 2°|m(X) > O}. The belief and the 
plausibility of X are respectively defined for any X € 2° by 


[2] 
Le m(Y), (1) 


Ye2°|YCX 


m(Y) =1—Bel(X). (2) 
YE2°|XnY 40 


where X = © \ {X} is the complement of X in O. 

One has always 0 < Bel(X) < PI(X) <1, see [2]. For 
X =9, Bel() = PI(0) =0, and for X =O one has 
Bel(O) = PI(Q) =1. Bel(X) and PI(X) are often inter- 
preted as the lower and upper bounds of unknown prob- 
ability P(X) of X, that is Bel(X) < P(X) < PI(X). 
To quantify the uncertainty (i. the imprecision) of 
P(X) € [Bel(X), PI(X)], we use u(X) € [0,1] defined by 


u(X) & PI(X) — Bel(X). (3) 


0 if Bel(X) = PI(X), which 
means that P(X) is known precisely, and one has 
P(X) = Bel(X) = PI(X). One has u(@) =0_ be- 
cause Bel(Q) = Pl() =0, and one has u(O) = 0 because 
Bel(©) = PI(O) = 1. If all focal elements of m(-) are single- 
tons of 2° the BBA m(-) is a Bayesian BBA because VX € 2° 
one has Bel(X) = Pl(X) = P(X) and u(X) = 0. Hence 
the belief and plausibility of X coincide with a probability 
measure P(X) defined on the FoD 90. The vacuous BBA 
characterizing a totally ignorant source of evidence is defined 
by m,(X) =1 for X = 0, and m,(X) =0 for all X € 2° 
different of ©. This very particular BBA has played a major 
role in the establishment of a new effective measure of 
uncertainty (i.e. entropy) for BBA in [5]. 


Bel(X 


PU(X) = 


The quantity u(X) = 


3also referred as a normal BBA, or a proper BBA in the literature. 
‘The symbol * means equals by definition. 


B. Entropy of a BBA 


In [6] we did analyze in details forty-eight measures of 
uncertainty (MoU) of BBAs by covering 40 years of research 
works on this topic. Some of these MoUs capture only a par- 
ticular aspect of the uncertainty inherent to a BBA (typically, 
the non-specificity and the conflict). Other MoUs propose a 
total uncertainty measure to capture jointly several aspects of 
the uncertainty. Unfortunately, most of these MoUs fail to 
satisfy four very simple reasonable and essential desiderata, 
and so they cannot be considered as really effective and 
useful. Actually only six MoUs can be considered as effective 
from the mathematical sense presented next, but unfortunately 
they appear as conceptually defective and disputable, see 
discussions in [6]. That is why, a better effective measure of 
uncertainty (MoU), i.e. generalized entropy of BBAs has been 
developed and presented in [5]. The mathematical definition 
of this new effective entropy is given by 


U(m) = > s(X), (4) 
XE2° 
with 
s(X) = —m(X)(1 — u(X)) log(m(X)) 


+u(X)(1—m(X)). (5) 


s(X) is the uncertainty contribution related to X named the 
entropiece of X. This entropiece s(X) involves m(X) and 
the imprecision u(X) = Pl(X) — Bel(X) about the unknown 
probability of X in a subtle interwoven manner. Because 
u(X) € [0,1] and m(X) € [0,1] one has s(X) > 0, 
and U(m) > 0. The quantity U(m) is expressed in nats 
because we use the natural logarithm. U(m) can be ex- 
pressed in bits by dividing the U(m) value in nats by 
log(2) + 0.69314718. This measure of uncertainty U(m) is a 
continuous function in its basic belief mass arguments because 
it is a summation of continuous functions. In formula (5), we 
always take m(X) log(m(X)) = 0 when m(X) = 0 because 
limm¢x)+0+ m(X) log(m(X)) = 0. Note that for any BBA 
m, one has s() = 0 because m(0) = 0 and u(@) = 0. For 
the vacuous BBA, one has s(0) = 0 because m,(90) = 1 
and u(©) = 0. This measure of uncertainty U(m) is effective 
because it can be proved [5] that it satisfies the following four 
essential desiderata: 

1) U(m) = 0 for any BBA m(-) focused on a singleton X 

of 2°. 

2) U(m®) < U(m®’) if |O| < |e’. 

3) U(m) = —Vixee M(X) log(m(X)) if the BBA m(-) 
is a Bayesian BBA. Hence, U(m) reduces to Shannon 
entropy [7] in this case. 

U(m) < U(m,) for any non-vacuous BBA m/(-) and 
for the vacuous BBA m,(-) defined with respect to the 
same FoD O. 

The maximum of entropy value is obtained for the vacuous 
BBA my, over a FoD 0, because m,, characterizes a source 
of evidence with a full lack of information. This maximum 
entropy value is U(m@) = 2!°! — 2 (see derivation in [5]), 


4 


wa 
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and it represents the sum of all imprecisions of P(X) for all 
X € 2°. Because forall X € 2° \ {0,O} one has® u(X) = 1, 
and one has u(@) = 0 and u(O) = 0 when considering the 
vacuous BBA, the sum of all imprecisions u(X) about P(X) 
is equal to 2!©!—2. It is worth mentioning that one has always 
U(m®) > log(|@|), which means that the vacuous BBA has 
always an entropy greater than the maximum of Shannon 
entropy log(|O]|) obtained with the uniform probability mass 
function (pmf) on the frame of discernment O. 


III. NEGATORS OF PMF AND BBA IN THE LITERATURE 


In this section we present several negators proposed in the 
literature with some examples, and we comment them. 


A. Dubois and Prade non-involutory negator of a BBA (1986) 


In 1986, Dubois and Prade introduced in [3] (pp. 202-203) 
for the first time the concept of negation of a BBA which 
has been adopted later by Smets [9] in his transferable belief 
model framework. Dubois and Prade (DP) negator is defined 
for any X C O by 


m(X) = m(X), (6) 
where X = © \ {X} is the complement of X in the FoD 0. 


This simple definition is quite natural except that it does 
not satisfy the involutory property because m # m in general. 
Because we consider that this must be a very a natural and 
desired property to satisfy by an effective negator, we do not 
consider DP negator as effective. Moreover, it is clear that the 
DP negator of the vacuous BBA given by 7m,(0) = m,(O) = 
1 is not a proper BBA. 


B. Yager’s non-involutory negator of a pmf (2015) 


Yager introduced the concept of the negation of a probability 
distribution P in [4] which was raised by Zadeh in his Berke- 
ley Initiative in Soft Computing (BISC) blog. By the term 
negation Yager means the representation of the knowledge 
we use if we have the statement not P. The negation of 
a probability mass function (pmf) P(.) over a reference set 


© = {61,02,...,4n} has been defined by Yager as follows 
_ 7 = 
P(6;) = xP (4), (7) 


where 6; = © \ {6;} is the complement of 6; in the set 0, 
P(0;) = 1— P(0;), and where \ is a normalization factor 

given by 
A=)° PG)=>/-P@))=n-1. 8) 

i=l i=1 

The definition (7) is called Yager’s negator in the literature 
[10]. Yager’s justification for his definition is based on the 
maximal entropy principle of the weights associated with each 
focal element. As Yager pointed out in [4] the definition 
(7) does not satisfy the double negation (i.e. involutory or 
involutionary) property in general (when |O| > 2 and P(.) is 


5Because [Bel(X), Pl(X)] = [0, 1]. 


not the uniform pmf), that is P(.) 4 P(.). Yager’s negator of a 
probability distribution is the one that provides the maximum 
entropy among all possible negation definitions. The iterative 
application of Yager’s negator converges towards the uniform 
pmf for which the entropy is maximal in the framework of the 
probability theory. 


Example 1: Consider the set 0 = {6),02,03} and the pmf 
P(.) with P(6,) = 1 (ie. 61 is a sure event). Based on (7), 
we get P(61) = 0, P(02) = 1/2 and P(03) = 1/2. In this 
example the whole probability mass P(0,) = 1 is equally 
distributed back to singletons 02 and 63 of 6; = 02 Us. The 
double Yager’s negator of P(.) is 


P(01) = — = 0.5 PGi), 
P(62) = —~ = 0.25 # P(42), 
= 1-05 

P(@3) = [—- = 0.25 # P(6s). 


Example 2: Consider the set 0 = {61, 02, 03} and the uniform 
pmf P(.) with P(@,) = P(@2) = P(63) = 1/3. Based on (7) 
we get also the uniform pmf because 
1—(1/3) 2/3 
A E18, 
3-1 2 ic 

As a general result, the negation of any uniform pmf defined 
over © of cardinality n > 1 is always equal to the uniform 
pmf, i.e. the negation operator has no impact on the uniform 
pmf, and if P(.) is uniform we always have P(.) = P(.). The 
uniform pmf is the fixed point of Yager’s negator. 


P(01) = P(02) = P(63) 


Example 3: Consider the set 0 = {0), 62,03} and the non- 
uniform pmf P(.) with P(#,) = 0.6, P(02) = 0.3 and 
P(@3) = 0.1. Based on (7) we get the negator 


: bate 
P(1) = = = 0.20, 
7 1-03 
P(62) = —- = 0.35, 
1p 
P(s) = S—- = 0.45. 


Note that in the very particular case where O = {6} (i.e. 
there is only one element in the reference set), we have n = 1 
and necessarily P(0;) = 1. Therefore by applying (7) we 
obtain P(0,) = Py = 0/0 which is indeterminate. 

A generalization of Yager’s negator has been proposed in 
[10] by considering 


P(;) = + -4- P(A), (9) 


where d € [0,1] is a tuning parameter, and 


A=) 0(1-d- P(G;)) =n-d. (10) 
i=1 

The analysis of the new properties of Yager’s negator has 

been done by Srivastava et al. in [11], [12]. An extension of 

Yager’s negator based on Tsallis entropy has been proposed by 
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Zhang et al. in [13]. More recently a non-involutory exponen- 
tial negator for pmf has been proposed by Wu et al. in [14] 
which unfortunately does not bring useful advantages w.xrt. 
Yager’s negator for applications up to now. Wu’s exponential 
negator has a better convergence rate towards uniform pmf 
by the repetitive application of it, but the real interest of 
this behaviour for practical applications is questionable and 
remains to be demonstrated. 


C. Yin’s non-involutory negator of a BBA (2019) 


In 2019, Yin et al. [15] proposed a definition of the 
negation of a BBA as a three steps procedure which can be 
expressed more concisely as follows for any focal element 
X of a BBA m/(.) defined over a frame of discernment 
oO — {01, 02, Perec Ona hs 


1 
m(X) => -(1=m(X)), (11) 
where A is the normalization constant defined by 
A= (l-m(X))=N-1, (12) 


XE2°9|m(X)>0 


and where N is the number of focal elements of ™m/(.). 

Clearly, Yin’s negator is directly inspired by Yager’s nega- 
tor, but it works with BBA instead of pmf. Yin’s negator is 
disputable because it is easy to check that it is non-involutory 
as shown in the example 4 next. 


Example 4: Consider the set 0 = {61,02, 03} and the BBA 
m/(.) with m(0,) = 0.1, m(O, U 02) = 0.2, m/(O2 U 63) = 0.3, 
m(0) = 0.4. Here N = 4 because we the BBA m/(.) has four 
focal elements. Based on (11), we get 
1—0.1 
4-1 
1—0.2 
4-1 
1-—0.3 
4-1 
1-04 
4-1 
The double Yin’s negator of m/(.) is 


m(0,) = 


= 0.30, 


m0, U 02) = x 0.27, 


& 0.23, 


= 0.20. 


m(01) = — — ~ 0.23 4 m(A1), 


iar os wai 0.24 £ m(6, U 2), 


1— 0.23 
4— 

1 — 0.20 
4-1 
So, we see that 7m.) obtained with Yin’s negator is not 

equal to the original BBA m(.). This simple counter-example 

suffices to prove that Yin’s negator is non-involutory. 


m(O2 U 03) y x 0.26 oa m(O2 U 63) 5 


~ 0.27 m(@). 


More problematic, Yin’s negator is indeterminate for the 
vacuous BBA m,(.) for which m,(O) = 1, because in this 
case © is the only focal element so that N = 1, and from 


(11) we get m(O) = Ente) = 0/0 which is indeterminate. 
Actually, this serious problem occurs not only for the vacuous 
BBA, but for any BBA focused on only one focal element. 
Therefore, Yin’s negator is not appropriate for the negation of 
a BBA. It is worth mentioning that the iterative application of 
Yin’s negator converges towards the uniform distribution of 
masses on all focal elements of m/(.) (assuming N > 1) [15], 
which does not coincide to the vacuous BBA that must give 
the maximum of entropy [5]. 

We mention that Yin’s negator has been presented also by 
Gao and Deng in [16]. Unfortunately this reference contains 
several mistakes, and the authors do not apply correctly Yin’s 
definition in some of their examples. For instance, in their first 
specific example considering 0 = {a} with m(a) = 1 (i.e. we 
have only N = 1 focal element for m(.)) the author claim that 
m(a) = 0. This is obviously wrong because one has m(a) = 
(1 — m(a))/(N — 1) = 0/0 which is indeterminate. Gao and 
Deng write also: Assumed a BPA that contains only one focal 
element (e.g. the m(a) = 0), the negation of BPA can be 
defined by ™m(a) = 1. This claim is incorrect because any focal 
element X must be such that m(X) > 0 by definition [2]. So, 
there is no proper BBA that contains only one focal element 
satisfying m(a) = 0. In their example 1 (see of Section IV-A 
of [16]) the same authors consider 0 = {a,b} with m(a) = 
m(b) = 0.5 (i.e. a Bayesian BBA with N = 2 focal elements). 
Applying (12) we must obtain ™(a) = (1—0.5)/(2—1) = 0.5 
and m(b) = (1 — 0.5)/(2— 1) = 0.5, and not m(a) = 0.25, 
m(b) = 0.25 and m(aU b) = 0.5 as the authors claim. This 
casts doubts on the correctness of the technical content of Gao 
and Deng paper [16]. 


D. Xie and Xiao non-involutory negator of a BBA (2019) 


An other non-involutory negator of a BBA has been pro- 
posed by Xie and Xiao in [17]. This negator is defined by 


m=E-m, (13) 


where m is the BBA m/(.) expressed as a vertical vector of 
size 2!°!, that is m = [m(0),m/(01),...,m(®)]”, and m is 
the negation vector of the BBA vector m which characterizes 
the negation of m(.). The matrix E is a negation symmetrical 
matrix E = [e;;] of size 2!°! x 2!°! whose elements e;; are 
defined as follows 


0,fori¢ =1,...,2/9l and j =1, 
0, for j =1,..., 2/9! andi =1, 
0, fori = j andi 4 Q/°l, 


1, fori = 7 = 2/91, 
|XiNX,| 
Dx, €2©\ {0,0} XaNX5|’ 


ey = (14) 


otherwise . 


where X;, is an element of the power-set 2© of the FoD 
©. By convention X; = @) and Xj\o; = O, that is k € 
{1,2,..., 2/1}, X; = © — {X;} is the complement of X; in 
the FoD 0. 

Xie and Xiao proposal for a BBA negator is based on redis- 
tribution factors defined by (14) which appear rather ad-hoc 
and counter-intuitive as the following example 5 demonstrates. 
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Example 5: Consider the set 0 = {61, 02,93} and the BBA 
entirely focused on 6;. Hence we have the BBA vector® 


m(0) 0 

~ m(O, U 02) 0 
m(O, U 3) 0 
m(O2 U 03) 0 
m(O) 0 


The negation matrix E = [e,;] is given by (see example 3 
in [17]) 


0 0 0 60 0 0 0 0 

0 0 1/4 1/4 0 oO 1/3 0 

0 1/4 0 1/6 0 1/8 0 0 
p_|9 V6 1/6 0 1/3 0 0 0 
lo 1/6 1/6 1/3 0 1/3 1/3 0 
0 1/6 1/3 1/6 1/3 0 1/3 0 

0 1/3 1/6 1/6 1/3 1/3 0 O 

0 0 0 0 0 0 0 1 


m(0) 0 
m(03) 1/6 
S m(03) 1/6 
ma 6, UO | Nag 
m(01 U 03) 1/6 
m(02 U 03) 1/3 

m(O) 0 


This result is very counter-intuitive because this negator 
commits some mass of belief to elements that have a non- 
empty intersection with 0,. This behavior does not make sense 
because the complement of @; must have an empty intersection 
with 6; so the mass of 6; must be redistributed only to 
elements of the power set that have an empty intersection 
with 6; or eventually their disjunction. Moreover, Xie and 
Xiao present their analysis of their negator using Deng’s 
entropy concept which is known to be non effective [6]. It 
is worth mentioning that Xie and Xiao negator is of course 
not involutory because in this very simple example we get 


(9) 0 
m(O1) 1/6 
mm(0>) 1/12 
—— m(03) _ — 4/12 
= VO, UO Oe | | 
m(O1 U 63) 1/4 
ra( Bo U 63) 1/6 
m(0) 0 


A variant of Xie and Xiao approach has been published by 
Luo and Deng in [18]. Unfortunately, Luo and Deng negator 
is not involutory and it suffers of the same problems as Xie 
and Xiao negator. 


The elements of 2° are listed as done by Xie and Xiao in [17]. 


E. Deng and Jiang non-involutory negator of a BBA (2020) 


In 2020, Deng and Jiang proposed a new negator for any 
BBA defined as follows [19] over a frame of discernment 0 = 
{01,02,...,On} 

m(X) = mY). 
Ye2°| Us; ev (O\{Gi }JHX 


As explained in [19] (p. 348) the authors consider that 
the negation of a singleton focal element X = 6; is X = 
6; = © \ {6;}, and if a focal element X is not a singleton 
its negation is equal to X = U,-x(O \ {A}) = 0. 
This is what we call here the Deng-Jiang complementation 
principle. Unfortunately, there is no strong justification for 
adopting this complementation principle which is ad-hoc and 
very counter-intuitive because the negation of all non-singleton 
focal elements will correspond to the same complement ele- 
ment © which is the whole FoD. This principle is actually 
inappropriate. The application of formula (15) is illustrated in 
the Example 6 drawn from [19]. 


Example 6: Consider the FoD 0 = {61, 62, 03} and m/(.) with 
m(61) = 0.7, m(O2 U 03) = 0.1, and m(O4 U A U 03) = 0.2. 

Because the focal element 6; is a singleton, its Deng-Jiang 
complement is the classical complement 6; = © \ {0,} = 
@2 U 83, and we have 


(15) 


Because the focal element 92U03 is a not a singleton, its Deng- 
Jiang complement is (by definition) taken as 02 U 03 = O, and 
we have 


m(Oo U 03) = m(O) = m(O2 U 63) ='Qulb: 


Because the focal element 0 = 6; U 62 U 3 is a not a 
singleton, its Deng-Jiang complement is (by definition) taken 
as 6, U 4 U 63 = O, and we have 


m(O, UU 3) = m(O) = m(O, U@2U 03) =:072). 


Finally the two negator masses (OQ) = 0.1 and m(O) = 0.2 
are added together to give the final result 77(0;U02U03) = 0.3. 
This is how formula (15) works. 

Besides its weird complementation principle, Deng and 
Jiang’s negator is not involutory in general. Indeed, if we 
apply this negator on the negator 77(.) of example 5 we obtain 
m(0, U 02 U 03) = 1 which is the vacuous BBA and not 
the original BBA m(.) of example 5. This non-involutionary 
property also appears in Table 4 of [19]. We point out also a 
flaw of Deng and Jiang negator which has a problem in the 
very special case where 0 = {6} and m(0,) = 1. In this 
case because the focal 0; is a singleton and based on Deng 
and Jiang’s complementation we have 6, = © \ {Oi} = 0 
and applying Deng and Jiang’s formula (15) we will get 
m(0:) = m(0) = m(01) = 1 which is not a proper BBA 
according to Shafer’s definition [2]. As for Yin’s negator, we 
consider that Deng and Jiang negator is not appropriate, and 
not effective from the theoretical standpoint, and we do not 
recommend its use for applications. 
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F. Batyrshin’s involutory negator of a pmf (2021) 
In [20], [21] Batyrshin proposed an involutory negator of a 


pmf P defined over a reference set O = {61,02,...,On} by 
: MP — P(6;) 
= SS 1 
P(6i) n:-MP-1 ue) 


where MP = max P + minP. 


Example 1 (continued): Consider the set 9 = {61, 62, 63} and 
the pmf P(.) with P(@,) = 1 (ie. 0; is a sure event). Based 
- aaa we get MP = 1 and Batyrshin’s negator P(0,) = 

= = 0, P(02) = =? =1/2 and P(03) = =? = 1/2. This 
reac coincides oi Yager s result based on (7). The double 
Batyrshin’s negator of P(.) is now given by 


0.5 —0 


S 0.5 — 0.5 
= 0.5 — 0.5 
P(@3) = sp 7 = 0/0.5 = = P(63) 


Example 2 (continued): Consider the set 0 = {61, 02, 03} and 
the uniform pmf P(.) with P(@,) = P(62) = P(03) = 1/3. 
Based on (16), we get MP = ++ 3 = 2/3 and Batyrshin’s 


negator 
P(0,) = P(62) Soa 


which corresponds also to the uniform pmf, and thus the 
double Batyrshin’s negator of uniform pmf P(.) is equal to 
itself. Actually, the uniform pmf is a fixed point for Batyrshin’s 
negator (as it is for Yager’s negator too), see [20] for details. 


P(63) = 1/3, 


Example 3 (continued): We consider © = {61, 02, 03} and the 
non-uniform pmf P(.) with P(0;) = 0.6, P(@2) = 0.3 and 
P(63) = 0.1. Batyrshin’s negator of P is given by’ 


0.7-0.6 0.1 

P(6,) = = —_ 0.09 

(91) S07=2 ii 

Z 07-05 . 64 

(2) 307-1 i 
OT—01 06 

P(63) = SOFT = 


It is worth mentioning that Batyrshin’s negator of P(.) 
equals P(.) in the very special case where O = {6;}, 
because in this case one has n = 1 and necessarily P(6,) = 1. 
Therefore, min P = max P = 1 and MP = 2, and from (16) 
- obtain Batyrshin’s negator P(0;) = MBs) == 

P(61). 

Even if this negator is appealing from the theoretical stand- 
point when working with probabilities its real usefulness has 
to be shown in real applications. Batyrshin’s negator has not 
yet been extended for the framework of the theory of belief 
functions, and it may be interesting to extend it (if possible) 
for the theory of evidence 


7because MP = 0.7. 


G. Liu’s non-involutory negator of a BBA (2023) 


In 2023 Liu et al [22] did propose a new negator of 
a BBA m(.) defined over a frame of discernment 0 = 


{61, 02, . . agOart by 
1 gel 1 
m(X)=>- (l1—m(X)), (17) 
A DVye2e|vVzX Fle 


where X © 2°, and X is the normalization constant defined 
b 
. a | 
A= => ———— (1 - 


m(X)). 
X€E2° diye2°|vy4ex aed 


(18) 

This new negator is unfortunately not involutory as proved 
by the authors in [22], and they justify this new negator based 
on Deng’s entropy concept which is non effective [5], [6]. So, 
their justification is flawed. It is also clear that the concept of 
complementation used by Liu et al. is inappropriate as shown 
in the very simple following example. 


Example 7: Consider the FoD 0 = {A, B} and the vacuous 
BBA mzy,(.) defined on this FoD by m,(0) = 0, my(A) = 0, 
my(B) = 0, and m,(AU B) = 1. By applying (17) we will 
obtain® the following Liu’s negator 


i ay — , 
mo(0) = > a) _ =, 
4) ay — , 
mo(A) =< ae 
Bl 41 — ’ 
miAUByss. mtd mau) =o J8 = 
because \ = (0/4) + (2/4) + (2/4) + (0/2) = 1. 


One sees clearly that Liu’s negator is inappropriate because 
A and B cannot be considered as valid complements of AUB 
because AN (AU B) £0 and BN (AUB) £0. 


IV. A NEW INVOLUTORY NEGATOR FOR BBAS 


We have shown in the previous section that most of nega- 
tors developed previously are not involutory functions except 
Batyrshin’s negator which applies only to probabilities, and not 
to non-Bayesian BBAs. Consequently, these negators increase 
in general the entropy when negator applies iteratively and this 
iterative application of negator makes the result to converge 
towards uniform pmf or BBA, which is not very useful in 
practice. In this section we present a new simple definition for 
an involutionary negator of any BBA m(.) : 2° — [0,1] which 
expresses the opposite evidence of any source of evidence 
characterized by m(.). The opposite (i.e. the negator) of the 
BBA m<(.) is denoted by ™m(.), and it is simply defined by 


0,if X =O, 
m(X) = 4 m(X), VX 40c0, (19) 
m(0),if X =O. 


Because || = 0 and 2° = 1 we have 2/9] -1 =0. 
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xX is the complement of the subset X in the FoD 9, that is 
X=0\ {XxX}. 


This new negator defined by (19) is actually a revised 
definition of Dubois and Prade negator (6). This definition may 
appear strange at the first glance for some readers because of 
the conditions m(0) = 0 and m(©) = m(O). Some readers 
may dispute why the mass of belief committed to the whole 
ignorance proposition © is kept unchanged in the expression 
(19) of the negator of the source of evidence. This is a 
legitimate question because the (classical) complement © of 
© in © is equal to the empty set, and because the (classical) 
complement () of the empty set in © is equal to ©. As Dubois 
and Prade did, we could consider a priori taking (0) = m(O) 
and m(©) = m/(@). We think however that this option is 
actually not very reasonable because it would mean that the 
negation of a BBA will not be a proper BBA (as defined 
by Shafer in [2]). In fact, we would have m(()) > 0 when 
m(©) > 0, and we would always have m(O) = 0 because 
m(@) = 0 which is very restrictive. We consider that the most 
reasonable solution is to consider that the negation of the BBA 
m(.) is better defined by (19). This new very simple definition 
for the negator of a BBA presents the great advantage to 
preserve the involutionary property of the negator concept of 
a BBA so that m/(.) = m(.). Note that the negator of any 
BBA m(.) defined by ™m/(.) in (19) is a proper BBA because 
m(X) € [0,1], m(@) = 0 and Soy 256 m(X) = 1 because 
the focal elements of ™(.) belong to 2° and they correspond 
to the complement of the focal elements of m(.) which is a 
proper BBA. 


We mention that the negator of a Bayesian BBA is not a 
Bayesian BBA in general as soon as the FoD O has more 
than two elements. This is normal because if a focal element, 
say 0; € O, of m/(.) is a singleton with m(6;) > 0, then its 
complement 6; = ©\{6;} is not singleton of © if |O| > 2, and 
we have m(0;) = m(@;) > 0 with |6;| > 1 which indicates 
that the BBA 7m.) is not Bayesian. We also mention that the 
negator of the vacuous BBA m,(.) is equal to itself, which 
indicates that the vacuous source of evidence plays a neutral 
role with respect to this new negator concept. This is not very 
surprising because from no useful information (characterized 
by a fully ignorant source whose BBA is the vacuous BBA) we 
cannot draw any conclusion for making a decision in favor of 
one hypothesis or its opposite. This makes the definition (19) 
coherent with the intuition when working with vacuous BBA 
and the negator concept. 


Of course, it is always possible to approximate any non- 
Bayesian BBA (or any non-Bayesian negator of a BBA) by a 
pmf (if we want or we need for any reason) thanks to different 
techniques of approximation, for instance using BetP, or DSmP 
transformations [23], [24]. As a simple example consider the 


FoD © = {A, B, C} and the Bayesian BBAs m/(.) defined by? 
m(A) = 0.9,m(B) = 0,m(C) = 0.1. 
Its negator is the non-Bayesian BBA defined as 
m BUC) =0.9,m(AU B) =0.1. 


If one approximates ™m(.) by a probability measure thanks 
to the BetP transformation for instance, we will obtain the 
Bayesian negation of m/(.) denoted either as MBayesian(.) or 
more concisely as BetP(.) which is given by 


“ 1 

BetP(A) = g(a UB) =0.05, 
5 1 1 

BetP(B) = gm(B UC)+ gi(A UB) =0.50, 
= 1 

BetP(C) = gm(B UC) =0.45. 


This result is quite reasonable because based on the fact 
that m(BUC) = 0.9 and m(AUB) = 0.1 (when considering 
the negator of m/(.) as valid input information) it makes sense 
that B has the most chance to occur among A, B and C, and 
A has the second best chance to occur. This is what reflects 
the BetP(.) distribution for this example. 


V. DIRECT AND INDIRECT FUSION APPROACHES 


In this section we recall the principle of the classical direct 
fusion approach, and we describe also the principle of the 
indirect fusion approach based on the involutory negator of 
the BBAs described in the previous section. 


A. Direct fusion approach 


Before presenting the application of the BBA negator for 
information fusion, we recall several well-known fusion rules 
used to combine directly distinct bodies of evidence repre- 
sented by the BBAs m1, mg, ..., mg defined over the same 
FoD 0. This is what we call the direct fusion approach. 

To make this presentation simple, we present the formulas 
for the combination of two BBAs only (i.e. S = 2). General 
formulas for more than two BBAs can be encountered in the 
literature, see [2], [25], [26] for instance. A survey of more 
fusion rules can be found in [27]. 


e Conjunctive rule of combination: VX € 2°, 


m'o(X) = a m1(X1)m2(X2). (20) 
X1,X2€2° 
X{NX_g=X 
e Disjunctive rule of combination: VX € 2°, 
myo(X) = So mi(X1)m2(X2). (21) 
X1,X2€2° 
X{UX2=X 
°Here we voluntarily indicate that m(B) = 0 for convenience to point 


out that we have three elements in the frame of discernment. The notation 
m(B) = 0 could be omitted of course because only A and C are the focal 
elements of the BBA m(.). In fact all elements of the power set 2° have 
masses equal to zero except A and C in this example. In general, only masses 
of focal elements need to be listed because all other masses equal zero. 
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e Dempster-Shafer (DS) rule of 
m?3(0) = 0 and VX € 2° \ {0} 


xy X2E2° M1 (X1)mM2(X2) 
X{NXg=X 


1— m{'4(0) 


e Proportional Conflict Redistribution Rule no. 6 (PCR6) 
[26]: miS*6(0) = 0 and VX © 2° \ {0} 


combination [2]: 


(22) 


mis? (X) = mi!o(X)+ 
ys my1(X)?m2(Y) 


ae m1(X) +ma(Y) 
XnNY=0 


m2(X)?m1(Y) 


2 
ma(X) my) 2 


In this paper we consider only PCR6 rule because we 
use examples for the fusion of only two BBAs to keep 
the presentation as simple as possible. If one needs to 
combine three BBAs (or more) altogether, we recommend 
to use the improved PCR6 rule (denoted by PCR6*) 
which is presented in details!° in [26] with Matlab codes. 


If the sources of evidence are considered fully reliable the 
conjunctive fusion rule applies, but it happens that the sources 
to combine are conflicting if m{',() > 0. In this case m{}4(.) 
is not strictly a proper BBA. To overcome this problem, 
Dempster-Shafer (DS) rule of combination or PCR6 fusion 
rule can be used to obtain a normalized and combined BBA. 
DS rule offers the main advantage of being associative making 
its use quite easy for the applications, and DS preserves 
the neutrality’! of the vacuous BBA m, which is generally 
considered as a good property for a fusion rule. DS rule 
being associative, the sequential DS fusion of many sources 
of evidence is independent of the sequence order which is 
appealing. Unfortunately, DS rule exhibits counter-intuitive 
dictatorial behavior in high and low conflict situations as well 
[28], [29]. This is one of the main reasons’? why DS rule has 
been abandoned by many researchers and engineers working 
with belief functions. If the two sources are in total conflict 
(ie. m{\.(0) = 1), DS rule does not work because of the 
division by zero in (22). PCR6 rule provides more reasonable 
fusion results, and it works in low and high conflicting 
situations as well. PCR6 does not behave dictatorially. The 
main disadvantage of PCR6 rule is its high complexity because 
it is not associative that is why all the sources of evidence 
must be combined altogether (not sequentially) with this rule. 
PCR6 does not preserve the neutrality of the vacuous BBA 
my when combining more than two BBAs altogether, but an 
improved version denoted by PCR6~ preserves the neutrality 
of m,, see [26] for details. If one of the sources of evidence 
is not reliable and we do not know which one, the disjunctive 
fusion rule applies. 


10Note that PCR6* and PCR6 rules coincide for the fusion of two BBAs. 

The vacuous BBA my has no impact on the fusion result when combined 
toa BBA m 4 my with DS rule. 

The second main reason is that Shafer’s BBA conditioning based on DS 
rule is not consistent with lower and upper bounds of conditional probability 
[30]. 


Finally the direct fusion approach of S BBAs mj, mg, ...; 
mg defined over the same FoD © is denoted symbolically by 


DF 


™y79...., g = F(mi,ma,...,ms), (24) 


where DF means the chosen Direct Fusion (DF) rule used 
for the combination of the S sources of evidence. Typically 
DF = DS if we use Dempster-Shafer rule for making the direct 
fusion of the S BBAs, or DF = PCR6 if we use the PCR6 
fusion rule for making the direct fusion, etc. 


B. Indirect fusion using the involutory negator of BBAs 


In some information fusion situations the combination of 
BBAs mj(.), ma(.), .... mg(.) (with S > 1) is problem- 
atic if there exist some conflicts between the sources of 
evidence. This means that m(X1)m2(X2)...ms(Xg) > 0 
when X;M X2M...9 Xs =@ for some focal elements Xj, 
X»2,..., and Xg. When conflicts occur the simple conjunctive 
rule of combination (20) is not able to provide an acceptable 
fusion result because it commits a strictly positive mass of 
belief to the impossible event (ie. to the empty set), that 


pegereg 


to obtain what we consider as reasonable fusion result for 
decision-making support. That is the reason why many fusion 
rules of combination have been developed and proposed in the 
literature during the last decades [25], [27]. 

In this work, we propose a new generic approach to combine 
the sources of evidence thanks to their involutory negator of 
the BBAs, which is what we call the indirect fusion approach. 
The idea behind the indirect fusion approach is rather simple. 
Instead of combining directly the original BBAs by some 
fusion rules (typically Dempster-Shafer (DS) rule [2], PCR6 
rule [25], [26], Dubois-Prade rule [3], etc) to directly obtain 
the fusion result, we propose to compute the fusion result 
indirectly using the negators of BBAs. This new indirect fusion 
approach is based on the following three simple steps: 


e Step-1 (Calculation of the BBAs negators): 


Calculate the involutory negator of BBAs mj(.), mo(.); 


.., msg(.) using (19) to get 771(.), M2(.), ..., mg(.). 


e Step-2: (Combination of the negators) 


Combine (ie. fuse) the S > 1 BBAs ™(.), Mo(.), ..., 
mg(.) by a chosen fusion rule denoted symbolically by 
DF to get the direct fusion of negators, that is 


s(.) = F(mi, me2,...,ms). (25) 


The choice of the direct fusion rule DF for combining the 
negators is left to the fusion system designer. Proponents 
of DST will prefer Dempster-Shafer’s rule of combination 
(22), while opponents of DS rule will use other fusion 
rules (typically PCR6 rule (23), etc). 


e Step-3 (Negation of the fused negators): 


Once the fusion result m>'___¢(.) is obtained, one cal- 
culates its negator to get the final Indirect Fusion (IF) 
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result of the original BBAs thanks to the definition (19) 
where m/(.) is replaced by Mm? g(.), that is 


0,if X =90, 

my .g(X), VX #0C O, 

m?5...g(9), if X = 0. 
More concisely, we will write steps 1, 2 and 3 by the 

symbolic expression 


(26) 


(27) 


where the negator operator used in (27) (represented by a bar 
symbol) is the involutory negator defined in (19). 

As it will be discussed in Section VIII, in general we 
have my og #mP5 __g. This means that the direct and 
indirect fusion methods provide different results depending on 
the fusion rule chosen, and on the distribution of masses of 
belief to focal elements. This is because the fusion rules do 
not satisfy De Morgan’s law when a conflict exists between 
the sources of evidence. Only in the case where S = 2 and 
my(.) = my, or m2(.) = my, one has mii, = mP', because 
there is no conflict between the two sources of evidence to 
deal with in this very particular case. 


VI. SOME INTERESTING EXAMPLES 


In this section we examine three interesting examples where 
a conflict exists between two sources of evidence, and we 
compare the result based on indirect fusion method with the 
result obtained with the classical direct fusion approach using 
DS and PCR6 rules of combination. 


A. Zadeh’s example (two Bayesian BBAs) 


Consider the famous Zadeh’s example [28] where 0 = 
{A, B,C} represents three hypotheses about the origin of a 
diseases of a patient, and two Bayesian BBAs ™m (.) and m2(.) 
expressed by two doctors after the examination of the same 
patient. These BBAs are given as follows 


m (A) = 0.9, m1(C) =0.1 ; 
e Direct fusion with DS rule: 


By applying DS rule (22), we obtain the Bayesian BBA 
m?%,(C) = 1, which is considered as a counter-intuitive 
result by Zadeh and by many authors because this result 
means that the hypothesis C’ is diagnosed for sure for the 
origin of the disease by DS rule even if both doctors agree 
in committing a low belief to the origin C. This example 
is important because it has served as a starting point to 
question the validity of DS rule in Shafer’s theory of 
belief functions by Zadeh. This result has however been 
justified by a first school of proponents of DS theory 
by the fact that the two sources of evidence are highly 
conflicting in this example because 


my 2(0) = mi (A)ma(B) + m1 (A)ma(C) 
4+ ma(C)ma(B) = 0.81 +. 0.09 + 0.09 = 0.99 
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Therefore, the proponents of this rule argue that DS 
tule should not be applied without preprocessing (i.e. 
discounting) the sources of evidence in this situation. 
Other proponents of DS rule belonging to the second 
school of proponents of DS rule argue that DS result 
makes perfectly sense in Zadeh’s example. Both schools 
of proponents defend DS rule but their conclusions are 
based on very different contradictory arguments, which 
amplify the suspicion about the validity of DS rule as 
pointed out by Zadeh in [28]. Actually the two types of 
arguments used to defend DS rule are flawed because DS 
rule behaves dictatorially even in low conflict situation as 
well as reported by the authors in [29]. This will be shown 
in the problematic example of Section VI-B. 


Direct fusion with PCR6 rule 


By applying PCR6 fusion rule (23) to combine m4 (.) and 
M2(.) we get 


mG’ (A) = 0.486, 
mo°(B) =0.A86, 
mo °(C) =0028. 


This Bayesian PCR6 result is more reasonable than DS 
result because it clearly points out the difficulty to make 
a choice between hypotheses A and B because of the 
disagreement of two doctors while rejecting both the third 
hypothesis C’. 


Indirect fusion approach: 


By applying the indirect fusion approach, after step 1 we 
get the following BBAs negators 


= 0.9,m1(AU B) =0.1, 
= 0.9,m2(AU B) =0.1. 


We observe that there is no conflict between these two 
negators so that the conjunctive fusion rule can be used, 
and there is no need to adopt a specific management of 
conflicting masses either by DS rule, or by PCR6 rule 
because results from both rules are equal to the result 
computed with the conjunctive rule, when no conflict 
occurs. 

At step-2, we use the conjunctive fusion of m, and mo 
because there is no conflict between these negators, and 
we get 


M{\>(A) = m1(AU B)m2(AUC) = 0.09, 
M{\9(B) = m(BUC)m2(AU B) = 0.09, 
M{\o(C) = m1(BUC)m2(AUC) = 0.81, 
mM \(A U B) = (AU B)m2(AU B) = 0.01. 
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At step 3, we take the negator of m/{',(.) as the final 
indirect fusion result. We obtain!? 


mED(AU B) = m}4(C) = 0.81, 
mED(AUC) = mt(B) = 0.09, 
mfpauc) = O08, 


This ainenayedisn indirect fusion result is more accept- 
able than DS result because it reveals clearly the uncer- 
tainty between hypotheses A and B, while reinforcing 
the disbelief of nypotne ss C as we intuitively expect. We 
observe that m3’ A mP% and mi'z' - miS**. However, 
if we approximate mets fl ‘by a probability measure thanks 
to BetP transform we ‘obtain 


IF-N A B IF-9 A 
BetP-(A) = m1,2 “ U B) a, Tg? cs UC) =O45, 
peppy — MPA) | mEPBUC) _ 4 45 
2 2 _ 
IF-N A 
Betp# a (6) = mE S(C) ae ™y4,2 cs U C) 
4 a =01. 


This BetP'’(.) result is without doubt closer to the direct 
PCR6 fusion result than the DS result although not strictly 
equal. One observes that the BetP’(.) distribution ob- 
tained from this indirect fusion result coincides with the 
simple averaging fusion rule which is a common fusion 
rule adopted by users not familiar with belief functions. 
This behavior is, we think, another argument against the 
direct fusion result provided by DS rule. 


B. Dezert-Tchamova example (two non-Bayesian BBAs) 


Here we consider another problematic example presented 
by Dezert and Tchamova in [29] to show the dictatorial 
behavior of DS rule of combination in high and low conflicting 
situations as well. An infinity of problematic examples like this 
one can be defined, see [32] for more examples. We consider 
the FoD 0 = {A,B,C} with the following two (generic) 
non-Bayesian BBAs 


m,(A) =a,m (AU B)=1-a, 
m2(AU B) = bi, ™m2(C) = 1—b, —b2,m2(AUBUC) = bo, 


with 0 < a, b1, b2 < land bd; + bo <1. 
The conflict of these two BBAs is actually independent of 
the BBA mm (.) because 


m''4(0) = mi(A)me(C) + mi(A U B)ma(C) 
= mo2(C) =1-—b1 — be. 


One can easily verify that the direct Dempster-Shafer’s 


fusion of these two BBAs gives m3°(A) = m1(A) = a and 


3 We use the notation m]'$\(.) to explicitly specify that the indirect fusion 
(IF) has been done with the conjunctive rule (symbolized by the M symbol). 


mS (AUB) = m(AU B) = 1—a which indicates that the 
fusion result is actually independent of the BBA mo(.) even 
if m2(.) is not the vacuous BBA and the conflict degree can 
be taken as high or as low as we want. This behavior of DS 
rule is of course counter-intuitive and dictatorial, and that is 
why we do not recommend its use in applications. 


Example 8: We consider here Dezert-Tchamova example with 
parameters a = 0.3, b} = 0.2 and be = 0.3. Hence, 


m (A) = 0.3, m,(A U B) — 0.7, 


For this numerical example, using the conjunctive fusion 
rule, we obtain 
m{'>(0) = 0.3-0.5 +0.7- 0.5 = 0.50, 
m{\,(A) = 0.3-0.2+0.3-0.3 =0.15, 
m{|,(AU B) = 0.7-0.2+ 0.7- 0.3 = 0.35. 
One sees that there exists a positive conflict m{',(0) between 


these two sources of evidence that needs to be redistributed in 
order to obtain a proper resulting BBA. 


e Direct fusion with DS rule: 


By applying DS rule (22), we obtain 


mt >(0) =0, 
0.3-0.2+0.3-0.3 
mS, (A) = —>5. = 0.3 =m (A), 
0.7-0.2+0.7-0.3 


m?>(AU B) = =0.7=m,(AUB). 


0.5 


Same dictatorial DS fusion result would be obtained for 
other numerical values of positive parameters a, b; and 
ba with b} + bo < 1. 


e Direct fusion with PCR6 rule: 
By applying PCR6 rule’ (23), we obtain 
myo" (0) =0, 


mi 3° (A) = 0.2062 , 
mo (AU By) = 0.5542, 
mo"(C) = 0.2396 . 


We see that the PCR6 fusion rule does not behave 
dictatorially. It can be easily verified that the PCR6 fusion 
result changes with different values of b; and by. 


e Indirect fusion with DS rule: 


If we apply the indirect fusion approach, the negators of 
my,(.) and ma2(.) are given by 


m (BUC) = 0.3, 7m71(C) = 0.7, 
m2(C) = 0.2, m2(AU B) = 0.5, m2(AU BUC) = 0.3. 


M4cee formulas (12)-(14) in [31] for details. Note that PCR5 and PCR6 
formulas for the fusion of two BBAs provide the same result. 
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Hence the conjunctive fusion of negators gives 
m, 50) =07 -0.5=035, 
mi'o(B) = 0.3-0.5 = 0.15, 
m{'>(C) = 0.3-0.240.7-0.2+0.7-0.3 = 0.41, 
mi'o(B UC) = 0.3 - 0.3 = 0.09. 


Note that m{'5(.) A m{!,(.). Applying DS rule of com- 
bination of these negators we obtain 


1,2 

my >(B) =0. 18/0. 65 = 0:23; 

mS,(C) = 0.41/0.65 0.63 , 
my3(B UC) = 0.09/0.65 = 0.14. 


After taking the negator of m/%,(.) we obtain using the 
indirect DS (i.e. IF-DS) fusion approach the following 
final result 


TSO) = mPS@) =0, 
m2 (AU B) = mi3(C) ~ 0.63, 
mis’ (AUC) = mPs(B) & 0.23, 
mig (A) = miS(BUC) = 0.14 


This result appears a bit more acceptable than the di- 
rect DS fusion result without being dictatorial because 
m¥EP>(.) A mai(.) and mEPS(.) A mea(.). It is worth 
mentioning that this new indirect DS fusion approach 
does not always circumvent the bad dictatorial behavior 
of DS rule in general thanks to the negators and their DS 
fusion. To validate this remark, it is easy to built another 
(dual) Dezert-Tchamova example where the fusion of 
negators of BBAs really provides a dictatorial behavior 
instead of the direct DS fusion. For instance, consider 
© = {A, B,C} and the following BBAs 


m(BUC) =a,m(C) =1-a, 


mo(C) = by, m2(AUB) = 1—b;—bo, m2(AUBUC) => by ° 


Then we have 


m,(A) =a,mi(AUB)=1-a, 


tg(AUB) = by, m2(C) = 1-b1—be, m2(AUBUC) = be. 


The fusion of 77 (.) and 772(.) with Dempster-Shafer’ s 
rule exhibits a dictatorial behavior because one gets 
mbS(A) = a and m25(AU B) = 1—a, and the final 
result based on these negators and indirect DS fusion is 
dictatorial and given by 


my (BUC) = m7"(A) =a, 

mie (Cc) = mi (AUB)=1-2. 
Hence, we get mI§PS(BUC) = a = mi(BUC) and 
mi5PS(C) = 1- a = mi(C). So the use of Dempster- 


Shafer’s rule in the information fusion method based on 
the negators of BBA remains also disputable in this case. 


That is why in any strategy of fusion chosen (direct 
and indirect) we cannot recommend seriously Dempster- 
Shafer rule of combination because of its potential dic- 
tatorial behavior. 


e Indirect fusion with PCR6 rule: 


If we apply the indirect PCR6 fusion approach of the 
negators of mj4(.) and m2(.) we obtain 


mis (0) =0, 
13 (B) = m{\9(B) = 0.1500, 
mi(C)?me2(AU B) 


3 


7, PCR6 = aa ed ML ee ee 
m2 (C) = My 2(C) + in (C) + a(AUB) 0.6142 , 
mERS(BUC) = m)4(BUC) = 0.0900, 

7 2 
mPRS(A UB) = atC)ma(AU BY _ o sass. 


m4(C) + m2(AU B) 


After taking the negator of m{S*°(.) using Indirect PCR6 
(IF-PCR6) fusion approach we obtain the following final 


result 
mF) = MEK) = 0, 
IPFPCR6( AL) B) = iF 12°(C) = 0.6142, 
mi PCR (A UJ C) = m3 (B) = 0.1500, 
mF (A) =e (BUC) =0.0900, 
IF ou (6) = mo (AU B) = 0.1458 


We observe that direct and indirect PCR6-based fusion 
methods provide distinct results because mi'7'R°(.) A 
m}S*(.). It is worth noting also that the indirect fusion 
results based on DS rule (IF-DS) and PCR6 rule (IF- 
PCR6) provide quite similar maximal mass value for the 
same focal element AUB because mi'?* (AUB) & 0.63 
and mi 7° (AUB) = 0.6142. However, we observe that 
the set of focal elements of m¥?%(.) and m¥7C%*(.) 
are not the same because IF-PCR6 commits a mass 
specifically to the element C' which is not a focal element 
of mii;%(.). Therefore the structures (ie. the set of 
focal elements) of BBAs mi’ )%(.) and mif7"%(.) are 
different. 


C. Blackman’s example (Bayesian and non-Bayesian BBAs) 


This simple example has been introduced by Blackman in 
[33] and analyzed by the authors in [34]. We consider the FoD 
© = {A, B} and the following two BBAs 


We see that there is no way to decide either A or B in this 
particular example because each source of evidence does not 
bring useful information to help for decision-making. Each 
BBA m,(.) and mo(.) is completely symmetrical to A and B. 
So intuitively, there is no reason to expect an improvement in 
the decision-making based on the fusion of these two BBAs. 
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We mention that m(.) is a Bayesian BBA, and ma/(.) is a 
non-Bayesian BBA in Blackman’s example. 


The conjunctive fusion of m1(.) and mo(.) yields 


m2 (0) = m1(A)ma(B) + m1(B)m2(A) = 0.10, 
m{}>(A) = mi(A)me(A) + mi (A)m2(AU B) = 0.45, 
mi} (B) = mi(B)m2(B) + mi(B)m2(AU B) = 0.45, 
m{,(AU B) =0 


We see that the conflicting mass m{!,(0) = 0.10 must be 
redistributed to some elements of 2° \ {0} in order to get a 
proper fused BBA. 


e Direct fusion with DS rule: 


By applying DS rule (22), we obtain 


m3 (0) =0 

DS mip (A) = 

1,2(A) = — 50 7 0.45/0.9 = 0.5 

ial 
mP3(B) = 7 Set) = 0.45/0.9 = 0.5 
(5) 
mp3(AU B) = woe = 0.00/0.9 = 0 
12 


e Direct fusion with PCR6 rule: 
By applying PCR6 rule (23), we obtain 


mi2(0) =0, 


pet A) = ml (Ay ee 
Pe B) = Mp ) ats a 


miG°(AU B) = m{,(AU B) =0. 


As intuitively expected, the direct fusion results based on 
DS rule and on PCR6 rule do not help to make a rational 
decision in favor of A or B. 


Indirect fusion with DS and PCR6 rules: 


Applying BBA negator defined by (19), we obtain 
m(B) = 0.5, m1(A) =0.5 m)(AUB) =0, 
m2(B) = 0.1, m2(A) = 0.1, me(AUB)=0.8. 


Because |O| = 2, we observe that we have in this 
example 77;(.) = m4(.) and 12(.) = mo(.). Therefore, 
we will get the same result with the conjunctive fusion of 
m(.) and 72(.) as for the direct conjunctive fusion of 
my,(.) and mo2(.). The direct or indirect fusion methods 


based on DS and PCR6 rules will yield actually to the 
same fusion result, that is 


mS (A) = MEPS (A) = mBBS(A) = milf FORA) 


mp9(B) = mi3*(B) = mi3°(B) = 


= 05; 
mio (B) =04. 


This example is interesting because it clearly shows that 
there exist some situations where there is no advantage of 
using direct fusion w.rt. indirect fusion, and vice-versa. 


Extension of Blackman’s example 


We extend Blackman’s example using a bigger FoD as 
follows. We consider 0 = {A, B,C} with the two following 
BBAs 

my (A) 


= m,(B) — m,(C) — 1/3, 


As in the previous Blackman’s example we see that there is no 
way to decide either A, B or C' because each BBA mm (.) and 
mg(.) is completely symmetrical to A, B and C. So there is 
no rational reason to expect an improvement in the decision- 
making based on the fusion of these two BBAs. Here also 
my(.) is a Bayesian BBA, and mg(.) is a non-Bayesian BBA 
in this example. 

The conjunctive fusion of m1(.) and mo(.) yields 

mip(0) = 0.6/3, m}(A) 


= m,(B) = m}2(C) = 0.8/3. 


The direct DS fusion and the direct PCR6 fusion give the same 
result which is 


m3 > (0) = m5" (0) =0, 
m3 3 (A) = mis °(A )=1/3, 
mi> > (B) = m5 °(B )=1/3, 
my3(C )= mis °(C )=1/3. 


If we want to apply the indirect fusion methods, the negators 
of my4(.) and mo(.) are the non-Bayesian BBAs 74(.) and 


Mo2(.) given by 
m (BUC) =mi(AUC) 


m2(AUC) 


=m (AUB) =1/3, 
m2(BUC) = =m2(AUB)=0.1, 
m2(AU BUC) =0.7. 


The conjunctive fusion of 771(.) and 7m2(.) gives 


m'>(0) =0, 
m\(A) = m'\)(B) _ m\(C) = 0. 2/3, 
MpAWB) =m AUC) =m, (Buc) =—08/3. 


Because there is no conflicting mass to redistribute, there is 
no need to apply indirect DS fusion of 7m(.) and 772(.), nor 
indirect PCR6 fusion of 7(.) and 772(.). More precisely if 
we apply DS rule or PCR6 rule we will obtain 
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After taking the negator of ™/{!,(.) we obtain using indirect 
fusion approaches the following final result 


mya (.) = mig (.) = mig"(.) = mM2(.), 


where 

m2" (0) = mb (0) =0, 

fii. (BUC) =m (A) =02/3, 
ney (AUC) = mb) =0.2/3, 
mE(AU B) = m{(C) = 0.2/3, 
mC) =m(AU B) = 0.8/3, 
mE (B) =m" (AUC) = 0.8/3, 
mEN(A) = m (BUC) = 0.8/3. 


As we can see in this extended Blackman’s example, the 
indirect fusion approach eliminates the problem of conflict 
management because there is no conflict to deal with when 
working with the negators. Of course the fusion result is 
less specific than the one we obtain using the direct fusion 
approaches (based on DS and on PCR6 rules), and mi/;"(.) 
result is no more helpful for decision-making standpoint which 
is normal in such situation, but this indirect fusion approach 
could be interesting to use if other sources of evidence may 
become available in the fusion system. 


VII. Two IMPORTANT REMARKS 


Remark 1: As shown in Zadeh’s example of Section VI-A 
the indirect fusion method gives 


mi, (AUB) =m{',(C) =0.81, 
mi3\(AUC) = mi\2(B) = 0.09, 
mp, (BUC) =m)s(4) = 0.09, 
my 3 \(C) = m2.(AU B) = 0.01. 


It is interesting to observe that this result coincides with the 
fusion result obtained with the disjunctive rule of combination 
(21). Indeed, we have 


m>(AU B) = m1(A)ma(B) = 0.81, 
My9(A UC) = m1(A)me(C) = 0.09, 
m{o(BUC) = mi(C)m2(B) = 0.09, 
myo(C) = mi(C)m2(C) = 0.01 


We may question if the equality m¥'3\(.) = mj2(.) is a 
general property satisfied, or only just a coincidence. In fact 
this equality does not hold in general but it is due to the very 
particular structure of focal elements of the BBAs involved 
in Zadeh’s example. This equality does not hold even if there 
is no conflict between the negators (as it appears in Zadeh’s 
example). As a simple counter-example, consider again the 
extended Blackman’s example of Section VI-C where no 


conflict exists between the negators 77 (.) and 77%2(.). The 
indirect fusion approach gives the final result 


miz\(0) = 0, 
mi3''(A) = 0.8/3, 
mix (B) = 0.8/3, 
mi'x\\(C) = 0.8/3, 
mip (AU B) = 0.2/3, 
mie (AUC) = 0.2/3, 
mie (BUC) =0.2/3, 
me (AU BUC) = 


The fusion result obtained with the disjunctive rule of 
combination (21) for this extended Blackman’s example is 


my o(0) = 0 

My(A) = m1(A)mo(A) = 0.1/3, 

My9(B) = m1 (B)m2(B) = 0.1/3, 

My9(C) = m1(C)m2(C) = 0.1/3, 

M{9(AU B) = m1(A)m2(B) + m1(B)m2(A) = 0.2/3, 
My.9(AUC) = m1(A)ma(C) + m1(C)m2(A) = 0.2/3, 
Mi9(BU C) = m1(B)m2(C) + m1 (C)m2(B) = 0.2/3, 
My o(AU BUC) =m,(A)m2(AU BUC) 


aa m1(B)m2(A UBU C) 
+ m1(C)m2(A UBU C) = 07% 


We see clearly that m/f’ 


(.) A myo(.) in this example, 
so the property m/P31(.) = mY.(.) is not always satisfied. 
This means that De Morgan’s law does not hold in general 
in information fusion. More precisely, the direct disjunctive 
fusion of BBAs is not necessarily equivalent to the negator 
of the conjunctive fusion of negators. Similarly, the direct 
conjunctive fusion of BBAs is not necessarily equivalent to 
the negator of the disjunctive fusion of the negators. 


Remark 2: The negation of a BBA does not necessarily in- 
crease the entropy contrary to what is claimed in the literature 
in some papers cited in Section III. To prove this, just consider 
the FoD O = {A, B,C} and the BBA m/(.) given by 


m(AU B) =0.7,m(AUC) = 0.2,m(AU BUC) =0.1. 


It can be easily verified that the entropy of the BBA m(.) 
obtained by the formula (4) of effective entropy definition is 
(expressed in nats) 


U(m) = 4.299. 
The negator of m(.) based on the definition (19) is 
m(C) = 0.7,m(B) = 0.2,m(AU BUC) =0.1, 
whose entropy is (expressed in nats) 


U(m) © 1.254. 
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One sees that one has U(m) < U(m) in this simple ex- 
ample. Therefore, the negation of a BBA m(.) does not 
necessarily increase the entropy. This really depends on the 
distribution of the mass of belief committed to focal elements 
of the original BBA m/(.). 


VIII. MANAGEMENT OF DIRECT AND INDIRECT FUSIONS 


As shown in the examples of the Section VI the results 
obtained with direct fusion approach and indirect fusion ap- 
proach do not coincide but in very particular cases. In general, 
we have mi» ¢ Ams gs. Therefore, at this stage of our 
research work we are facing es 3 new peo what to do 
with these two fusion results m? g(.) and mii '2,...,g(-) for 
decision-making support? This sectea provides two possile 
answers to this important question. 


A. Answer 1: Fuse ee ___g with mes sg 


The first intuitive answer to the aforementioned question 
would consist in fusing (i.e. combining) the two fusion results 
m?5,...5 With m¥', 4 by some chosen appropriate rule of 
combination, typically the PCR6 rule (or the PCR6* rule 
if S > 2, see [26]). This first answer is unfortunately not 
very satisfactory and not recommended from the theoretical 
standpoint because the fusion results m?') 
are actually based on exactly the same original inputs cor- 
responding to BBAs mj(.), mo(.), ..., mg(.). Therefore, 
the inputs m?) .g cannot be considered as 
(cognitively) independent and their fusion is not recommended 
because of redundant information which may generate some 
biases in the final result, and induce decision-making mistakes. 

If this approach is however used in applications by some 
users, we suggest at least oh take into account the quality of 
each source my, is and m¥ ‘2,...,5 characterized somehow by 
their entropies ilmbE 2,45) and U(mE Fa uBies 

A very simple fusion method would consist bas instance 


to apply the weighted averaging fusion of m)! 2,..,§ With 
m¥5 5 defined for any X € 2° by 
m(X) = wmrs,..s(X) + whmi'ss(X), (28) 


where the importance weighting factors w?" and w"" belong to 
[0, 1] and satisfy w?F + wif = 1. . These factors should depend 
on the quality of the BBAs m} 2,. _s Which is 


DE __ Uy — —U(m? 8) 


Uy — U(mis 5) 
Uy — —U(my'... gree U(my's,.. 3) 
where U,, is the maximum ia of the entropy corresponding 


to the vacuous BBA m,(.). This max value is given by (see 
[5] for details) 


wr = 


j= QIE) =o 


Other fusion methods based on discounting techniques and 
entropies could be eventually developed also, but funda- 


mentally we do not recommend to combine mPS,__¢ with 


mee. g for the aforementioned reason of underlying depen- 


dency oF original BBAs that have been used to generate direct 
and indirect fusion results m? ¢ and mi, gs. 


B. Answer 2: Select either me, 


Because we consider that the intuitive previous answer is 
not satisfactory, we need to seriously consider a second option 
of mee of direct and indirect fusion results mes. 3 

- . This second option consists in selecting only 
g for decision-making support. 


one BBA. mbt DietiiS OPM B.3, 
But which one to select? How? 
For selecting the BBA m?!. Dyes _s We propose 
to adopt the maximum entropy ‘principle which states we 
should select the BBA which leaves us the eee remain- 
ing uncertainty. More precisely, we will select m? is if 
U(mPs,.s) > U(m¥, gs), and we will select mi ws if 
u(mi, 5) gs): In the very rare cases where 
F = lk 
a) nhs 
two BBAs moe Dyer gs coincide. This maximum 
entropy principle is rather simple to use in practice because 
we need only to calculate the entropies U(m?, ;) and 
U(mi's,.. 8): 


We now provide more details on how to proceed in the 
interesting examples considered in Section VI. 


For Zadeh’s example (two Bayesian BBAs) 


e With direct fusion using DS rule: 


We oe the Bayesian BBA m?%,(C) = 1. The entropy 
of mPS, is U(mPS,) = 0 nats. This stipulates that there is 
no uncettainty carried by this very specific BBA which 
is a counter-intuitive result as explained in [28]. 


e With direct fusion using PCR6 rule 
We obtain 
miG*(A) = mi 3'*(B) = 0.486, mi 3 °(C) = 0.028. 


The entropy of this Bayesian BBA m{S°° based on the 
formula (4) is U(m{S*°) = 0.8014 nats. 


e With indirect fusion approach: 


We obtain (see section VI) 


mi3'(AU B) =0.81, 
mi 3\(AUC) = 0.09, 
mi, (BUC) = 0.09, 
mi (C) =0.01. 


The entropy of this non-Bayesian BBA mj'3' based on 
the formula (4) is U(m'f3") © 3.8714 nats. 


Clearly, the BBA to use for decision-making support corre- 
sponds to the indirect fusion result m{'3' because U(m{'3') > 
U(m{S*°). From the selected BBA mis ‘a the final decision can 
be done thanks to different techniques that are detailed in [35]. 
In Zadeh’s example the hypothesis C’ will be rejected, even 
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if there is a tie between A and B. This tie can be eliminated 
arbitrarily (if we want), or randomly by a uniform random 
draw (i.e. perfect coin tossing) between A and B. 


For Dezert-Tchamova example (two non-Bayesian BBAs) 


We consider the example 7 given in section VI-B. 
e For direct fusion with DS rule: 


We have 


=m (AUB). 
The entropy of mP%, is U(m?3,) © 0.6108 nats. 
e For direct fusion with PCR6 rule: 
We have 
miG*(A) = 0.2062, 
mG (AU B) = 0.5542, 
miG(C) = 0.2396. 
The entropy of mi'S** is U(m{S**) = 2.917 nats. 
e For indirect fusion with DS rule: 
We have 
m¥ Ps (AU B) & 0.63, 
milf (AUC) = 0.23, 
mi (A) + 0.14. 
The entropy of m¥'?* is U(m¥ p>) © 3.4175 nats. 


e For indirect fusion with PCR6 rule: 


We have 
mi; ies (A) = 0.0900 : 
a F PCRS (C1) = 0.1458 ; 
= He tea 0 B) = 0.6142, 
ml 12 °(AUC) = 0.1500. 


The entropy of m¥7C®° is U(miEPC*) = 3.4358 nats. 


One sees that if DS rule is used by the user (for his 
own reason) and because U(mii?*) > U(m?%), it will be 
more reasonable for the user to select mifPS rather than 
mP?%, to draw the final decision. Because we do not rec- 
ommend DS fusion rule in general due to its bad dictato- 
rial behavior, we will actually select m'3"° for decision- 
making because U(mi'3°®°) > U(m{S**). For this example 
and based on ml PERS we will finally decide A because 

mi *% is closest to the sure BBA defined by m4(A) = 1 
than to the sure BBAs defined by mp(B) = 1 and by 
mc(C) = 1. More precisely, for this numerical example we 
get dyi(m PCRS, m4) = 0.5019, dpi(m*2CR®, my) = 0.6456 
and dyy (ml POR, mc) = 0.7093, ace dpi(.,.) is the Eu- 


clidean belief interval distance between two BBAs, see [35] 


for details. Note that the same decision A will be drawn 


incidentally from m{'3>. 


For Blackman’s example (Bayesian and non-Bayesian BBAs) 


For the simple Blackman’s example of Section VI-C we 
have 


m?9(A) 


m7o(B) 


= mp2” (A) = mpQ"(A) = myg*(A) = 0.5, 
= mi 7>(B) = miZ*(B) = mpz"*(B) = 0.5. 


Therefore there is no BBA selection to do because all 
coincide and we have 


U(m?5) = U(mt 7°) & ~ 0.6931 nats, 


and 
U (mi BS) = U(mE PRS) x 0.6931 nats. 


Because all the masses of belief of A and B are equal there 
is no way to make a rational decision towards A, or towards B. 
The final decision-making in this situation (where there is a tie) 
can be done based either on an arbitrary choice between A and 
B, or by a (uniform) random choice between A and B based 
on a perfect coin tossing experiment. Eventually in a given 
practical fusion problem (for instance in a tracking application) 
where a tie occurs we would estimate the main consequences 
generated by the arbitrary (or random) decision chosen (in 
term of costs and benefits for instance) to select the best one. 
This tie elimination method needs of course extra knowledge 
about the problem under concern. This goes beyond the scope 
of this paper. 

For the extended Blackman’s example of Section VI-C the 
direct DS fusion and the direct PCR6 fusion give the same 
following result 


m3 (A) = mi3"°(A) = 1/3, 
is (B) = mis“ (B) = 178, 
mi (C)= mis °(C) = 1/2. 
Therefore, 
U(mzs) = = Umit) = = 1.0986 nats. 


If we apply the indirect fusion approach, we obtain for this 
extended Blackman’s example 


mEDS(.) = mEZR() = mB), 
where 

mip (A) = 0.8/3, 

mi (B)=0.8/3, 

me (C0) =08/78, 

mie (AUB) =0.2/3, 

me (AWC) = 0,273, 

mo (BUC) =02/3. 
Therefore, 


U(mt2") =U(mitz*) = 2.0524 nats. 
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We observe that U(mi'7°®°) > U(miS**). This inequality 
indicates that the BBA m/f }C®°(.) (which coincides also with 
mE PS (.) and m3 (.)) is selected for the decision-making 
support. Because of the same repartition of mass of belief 
committed to A, B and C and their disjunctions there is no 
way to make a rational decision towards A, B, or towards C in 
this very particular tied situation unless an arbitrary or random 
decision strategy is adopted. So, there is no real advantage 
of using indirect fusion w.r.t. direct fusion in Blackman’s 
example. However, things could obviously become different 
if a third source of evidence enters in the fusion problem. 


IX. CONCLUSION 


In this paper we have analyzed different definitions of 
a negator of a probability mass function, and of a basic 
belief assignment (BBA) existing so far in the literature. In 
order to overcome their limitations we have introduced a new 
involutory negator of BBA. Based on it, a new indirect in- 
formation fusion method was proposed which can circumvent 
the conflict management problem in difficult fusion situations. 
The classical direct and the new indirect information fusion 
strategies were analyzed for three interesting examples of 
fusion of two BBAs. In order to manage properly these two 
types of fusion strategies, two methods for using the whole 
available information (the original BBAs and their negators) 
for decision-making support were presented. The first method 
is based on the combination of the direct and indirect fusion 
strategies. The second one selects the most reasonable fusion 
strategy (direct, or indirect) to apply based on the maximum 
entropy principle. A deep analysis of the advantages and 
drawbacks of these two methods was made. We will evaluate 
these new fusion approaches in different fields of applica- 
tions (multi-sensor data association for tracking, multi-criteria 
decision-making under uncertainty, perception in robotics, risk 
assessment, etc) in our future research works. We also invite 
the users of belief functions and the fusion system designers 
to share and report their evaluation of this new approach on 
their own applications in future publications. 
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Abstract—This short paper points out an erroneous claim 
about a new rule of combination of basic belief assignments 
presented recently by Kenn et al. in [1], referred as Kenn’s rule 
of combination (or just as KRC for short). We prove thanks a 
very simple counter-example that Kenn’s rule is not associative. 
Consequently, the results of the method proposed by Kenn et al. 
highly depends on the ad-hoc sequential order chosen for the 
fusion process as proposed by the authors. This serious problem 
casts in doubt the interest of this method and its real ability to 
provide trustful results and to make good decisions to help for 
precise breast cancer therapy. 


Keywords: belief functions, rule of combination, Kenn’s rule. 


I. INTRODUCTION 


Recently a paper devoted to the Breast Cancer Precision 
Therapy by Kenn et al. [1] attracted our attention for two 
main reasons: 1) this application of information fusion is very 
interesting and important; 2) Kenn’s et al. method is based on 
a new rule of combination of basic belief assignments (BBAs). 
Because we did some theoretical contributions in this field [2] 
we wanted to examine this new combination rule in detail. So, 
we have read with interest Kenn’s et al paper. Unfortunately 
we quickly discovered a serious erroneous claim about Kenn’s 
rule of combination (KRC) and this has strong consequences 
on the methodology presented by Kenn. In this short technical 
note we warn the readers of the risk of potential therapy errors 
if such a method is used in practice. We clearly explain the 
problem of the method presented by Kenn et al. To make the 
paper self-containing, we recall briefly the basics of belief 
functions in the next section, and the KRC in the section 
Ill. In section IV we prove thanks a very simple numerical 
counter-example that KRC is not associative. Conclusion and 
recommendations are given in the last section of this note. 


II. BELIEF FUNCTIONS 


The belief functions (BF) were introduced by Shafer [3] for 
modeling epistemic uncertainty, reasoning about uncertainty 
and combining distinct sources of evidence. The answer of 
the problem under concern is assumed to belong to a known 
finite discrete frame of discernement (FoD) 0 = {61,...,4n} 
where all elements (i.e. members) of O are exhaustive and 
exclusive. The set of all subsets of © (including empty set 
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), and ©) is the power-set of © denoted by 2°. The number 
of elements (i.e. the cardinality) of the power-set is 2!©!. A 
(normalized) basic belief assignment (BBA) associated with a 
given source of evidence is a mapping m°(-) : 2° — [0,1] 
such that m°(0) = 0 and Sy <0 m°(X) = 1. A BBA m9(-) 
characterizes a source of evidence related with a FoD ©. For 
notation shorthand, we can omit the superscript © in m°(-) 
notation if there is no ambiguity on the FoD we work with. 
The quantity m(X) is called the mass of belief of X. The 
element X € 2° is called a focal element (FE) of m/(-) if 
m(X) > 0. The set of all focal elements of m(-) is denoted! 
by Foe(m) = {X © 2°|m(X) > O}. The belief and the 
plausibility of X are respectively defined for any X € 2° by 
[3] 

Bel(X) = m(Y) 

YE2°|YCX 


(1) 


m(Y) = 1—Bel(X). 
YE2°|XnyZ0 


P1(X) (2) 


where X = © \ {X} is the complement of X in O. 

One has always 0 < Bel(X) < PI(X) <1, see [3]. For 
X=, Bel(O) = PIO) =0, and for X =O one has 
Bel(©) = PI(O) = 1. Bel(X) and Pl(X) are often inter- 
preted as the lower and upper bounds of unknown prob- 
ability P(X) of X, that is Bel(X) < P(X) < PI(X). 
To quantify the uncertainty (i.e. the imprecision) of 
P(X) € [Bel(X), PI(X)], we use u(X) € [0,1] defined by 


(3) 


If u(X)=0, Bel(X) = PI(X) and therefore P(X) is 
known precisely because P(X) = Bel(X) = PI(X). One has 
u(Q) = 0 because Bel() = P1() = 0, and one has u(O) = 0 
because Bel(O) = PI(O) = 1. If all focal elements of m/(-) 
are singletons of 2° the BBA m(-:) is a Bayesian BBA because 
VX € 2° one has Bel(X) = PI(X) = P(X) and u(X) = 0. 
Hence the belief and plausibility of X coincide with a prob- 
ability measure P(X) defined on the FoD ©. The vacuous 
BBA characterizing a totally ignorant source of evidence is 
defined by m,(X) = 1 for X = 0, and m,(X) = 0 for all 
X € 2° different of 0. 


u(X) & Pl(X) — Bel(X) 


'S means equal by definition. 
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In the Mathematical Theory of Evidence of Glenn Shafer, 
the combination of two BBAs my(.) and mg2(.) defined over 
the same FoD © is obtained with Dempster’s rule. More 
precisely” by m?35(0) = 0, and for any X € 2° \ {0} by 


Yo x1,xX2c0 ™1(X1)mo2(X2) 


ps X1NX2g=X 
2 (X) 1 — ox,,.x.co m1 (X1)mo(X2) - 
XiNX2e=0 


The value 12 = 5) x,,x.co ™1(X1)m2(X2) is classically 


interpreted as the déeren of contiick between the BBAs m1 (.) 
and mo(.). When the degree of conflict is maximum one 
has Ky = 1, and in this particular case Dempster-Shafer 
rule cannot be applied because of division by zero in the 
formula (4). This rule can be easily directly generalized for 
the combination of more than two BBAs. 

The DS rule has had a great success during the past decades 
in expert systems and artificial intelligence mainly because 
it is a commutative and associative rule of combination able 
to deal with (possibly epistemic) uncertainty and incomplete 
information based on an appealing mathematical framework. 
This makes its use very attractive from the implementation 
standpoint in decision-making support systems. Indeed, be- 
cause of its associativity we can apply DS rule sequentially 
when we have more than two sources of evidence to fuse, and 
the sequence order will not impact the DS fusion result. Un- 
fortunately, DS rule of combination generates counter-intuitive 
results due to the normalization step in DS formula (not only 
in high conflicting situations but also in low conflicting situ- 
ations as well), and it generates very controversial dictatorial 
behaviors, see [4], [5] for discussions with examples. That is 
why many alternatives of DS rule have been proposed in the 
literature to circumvent these serious problems of DS rule. 
Unfortunately, there is no general consensus in the scientific 
community about the choice of the best rule of combination 
of belief functions to make for the applications. 


III. KENN’S RULE OF COMBINATION 


Kenn’s rule of combination (KRC) proposed in [1] is a slight 
modification of DS rule introducing a tuning parameter \ € 
(0, 1]. The KRC of two BBAs my(.) and m2(.) defined over 
the same FoD © is denoted symbolically m; 6) mg in [1]. 
Its mathematical expression is given by? m/§"°(Q) = 0, for 
X € 29 \ {D} by 


min (X) = [m1 Ga me] (X) 


Yo x1,xX2c0 11(X1)m2(X2) 
= X1NX2g=X (5) 
1— AY? x,,x.co ™1(X1)mo(X2) 
XiNX2=0 


2DS upper index in formula (4) stands for Dempster-Shafer because this 
tule is often referred also as Dempster-Shafer rule of combination in the 
literature. 

3see formula (4) in [1] 


and for X = O, by 


mys") = [m1 ®, ma]() 


=1- SO) mi*°(X) (6) 


XCO 


For \ = 1, KRC coincides with Dempster-Shafer rule and 
consequently it will suffer of same problems as DS rule in this 
particular case. According to Kenn et al., the parameter A in 
the formula (5) provides flexibility to adapt to circumstances 
and the restriction to A < 1 is motivated by restricting the 
authors to an interpolation type evidential combination rule. 
KRC is claimed associative and commutative by Kenn et al. 
(see page 5 of [1]). We prove in the next section that KRC is 
in fact not associative. Because of non-associativity of KRC, 
the methodology proposed in [1] becomes very disputable 
and doubtful, and potentially very dangerous for breast cancer 
therapy application addressed by Kenn et al., and for any other 
applications using sequential fusion of sources of evidences 
based on KRC. 


IV. COUNTER-EXAMPLE OF ASSOCIATIVITY OF KRC 


To prove that KRC is not associative it suffices to verify 
that 
(m1 ®y M2) By m3 F M1 Oy (M2 Oy M3) (7) 


To prove (7) when A < 1, just consider for instance \ = 0.2, 
the FoD © = {A, B} and the three BBAs given in Table I 


Table I 
THREE BASIC BELIEF ASSIGNMENTS. 


) 0 0 0 

A 0.2 ; 0.4 

B 0.7 E 0.3 
AUB 0.1 ; 0.3 


A. Derivation of (m1 ®o0.2 M2) 0.2 M3 


For the combination of m , with m2 we have the degree of 
conflict 


The results of the conjunctive fusion of m , with mz for A 
and B are 


my2(A) = m1(A)m2(A) + m1(A)m2(A U B) 
+m2(A)m,(AU B) 
= 0.2-0.8+0.2-0.14 0.8 - 0.1 = 0.26 
m12(B) = m1(B)me2(B) + m1(B)m2(AU B) 
+m2(B)m,(AU B) 
=0.7-0.1+0.7-0.14+0.1-0.1 = 0.15 


For KRC of m1 with mz we get (taking \ = 0.2) m/$?°(0) = 
0 and 
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myy"°(A) = [m1 ©, me](A) 
= Se 0.2941 
1— Ky 1—0.2-0.58 
my2(B) 
mis (B) = [m1 ©, ma](B) = T-AR 
0.15 
1—0.2-0.58 poet 
ig” (A U B) _ [m4 @Oyr mag](A U B) 
ies 0.26 7 0.15 
7 1-—0.2-0.58 1-—0.2-0.58 
= 0.5362 


For the combination of meee =m, @y mz. with m3 we 


have the degree of conflict 


KRC 


= } KRC 
3 = M19 


(A)ma(B) + my" (B)ms(A) 
~ 0.2941 - 0.3 + 0.1697 - 0.4 = 0.1561 


K (12) 


The results of the conjunctive fusion of m/?° 
A and B are 


with m3 for 


KRC 


= KRC 
= M19 


+ miy"°(A)m3(A U B) 

+ m3(A)mjp"°(AU B) 
= 0.2941 - 0.4 + 0.2941 - 0.3 + 0.4 - 0.5362 
= 0.4204 


m12)3(A) (A)m3(A) 


KRC 


_ KRC 
= M19 


m12)3(B) (B)m3(B) + mj5"~ (B)m3(AU B) 


+m3(B)m&F° (AU B) 
x 0.1697 - 0.3 + 0.1697 - 0.3 + 0.3 - 0.5362 
= 0.2627 


KRC 


Therefore, the KRC of me with m3 yields ™(19)3 


0 and 


(0) = 


mais (A) = [mi®° @y ms](A) 
_ my12)3(A) 
1 =, AK (12)3 
0.4204 
= __. ss: 9):4339 
1—0.2- 0.1561 
m(ja\g (B) = [mi3"° & ms] (B) 
_ ™m12)3(B) 
wi = AK (12)3 
0.2627 
= ———q— 2 0.2711 
1 —0.2-0.1561 Det 
mid\3 (A UB) = [mi6P° ©) m3](A U B) 


= 1 — 0.4339 — 0.2711 & 0.2950 


Hence for the fusion (m1 8o.2 M2) Bo.2 m3 we get finally 


mid\5 (A) = [(m1 Go.2 M2) Bo.2 M3](A) 
= 0.4339 (8) 
mids (B) = [(m1 Go.2 M2) Go.2 m3](B) 
x 0.2711 (9) 
m{id\3 (A U B) = (my 90.2 ma) 60.2 ™3 (A U B) 
x 0.2950 (10) 


B. Derivation of m1 ®o.2 (m2 Go0.2 M3) 
For the combination of m2 with m3 we have 
Ko3 = m2(A)m3(B) + m2(B)m3(A) 
= 0.8-0.34+ 0.1-0.4 = 0.28 


The results of the conjunctive fusion of m2 with m3 for A 
and B are 
m3 (A) = m2(A)m3(A) + m2(A)m3(A U B) 
= 0.8-0.4+0.8-0.3 + 0.4-0.1 = 0.60 


m23(B) = m2(B)ms3(B) + m2(B)m3(A U B) 
+m3(B)m2(AU B) 
= 0.1-0.3+0.1-0.38 40.3 - 0.1 = 0.09 


For KRC of mz with m3 we get (taking \ = 0.2) 
ms;"C (0) = 0 and 

mg3"(A) = [m2 @, ms3](A) 

~~ {=AK. 1=02-058-~ D300 
m33"°(B) = [mz ©) ms](B) 

_™a3(B) 0.09 . 

~ L=hKee 1-09-0298 ~ unas 
miS2C(A U B) = [m2 ©) m3](AU B) 


= 1 — 0.6356 — 0.0953 = 0.2691 


For the combination of m1 with m53%C 
have the degree of conflict 
KRC 


A423) = m33°° (A)mi(B) + m3"° (B)my(A) 
= 0.63563 - 0.7 + 0.0953 - 0.2 = 0.4640 


= M2 0x, m3 we 


The results of the conjunctive fusion of m1 with ie for 
A and B are 


m4 (23)(A) = m3" (A) (A) + gg" 


+ mEFC(A)m,(AU B) 
+m1(A)m&F°(A U B) 
~ 0.6356 - 0.2 + 0.6356 - 0.1 + 0.2 - 0.2691 ~ 0.2445 


+ misP°(B)m1(AU B) 


+m,(B)m35?°(A U B) 
= 0.0953 - 0.7 + 0.0953 - 0.1 + 0.7 - 0.2691 ~ 0.2646 


my23)(B) = m33"°(B)mi (B) 
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Therefore, the KRC of m1 with ini yields m* 2° (9) = 


1(23) 
O and 
mya (A) = [ma By my3"°](A) 
1-A\Riex 1=02-04e00° 
my@y (B) = [mi ®) m35"°](B) 


~ T—AKis) 1 —0.2-0.4640 


mits) (A UB) = [mi ®y mISFCV(A UB) 
~ 1 — 0.2695 — 0.2917 = 0.4388 


Hence for the fusion m1 @o.2 (m2 @o.2 m3) we get finally 


mila (A) = [m1 Go.2 (m2 Go.2 m3)](A) 
~ 0.2695 aap) 
mias) (B) = [m1 Go.2 (m2 Go.2 m3)|(B) 
x 0.2917 (12) 
maya) (A U B) = [m1 Go.2 (m2 Go.2 m3)|(A U B) 
~ 0.4388 (13) 


We see clearly that KRC is not associative because (m1 ®) 
mz) ®@, m3 ~# m4 ®y (mz By m3) as reported in the Table II. 


Table II 
COUNTER-EXAMPLE OF ASSOCIATIVITY OF KRC WITH J = 0.2. 


0 0 


C. Comment on decision-making method used by Kenn et al. 
For our simple example we get with the sequential fusion 

(m1 0.2 M2) Go.2 M3 the following belief intervals 

Bely2)3(0), Pl(i2)3(9)] = [0, 0] 

Bel(i2)3 A), Pl(12)3(A)] = (0.4339, 0.7289] 

Bel(12)3 B), Pl12)3(B)] _ (0.2711, 0.5661] 

Bel(y2)3(AU B), Pl(q2)3(A U B)] = [1, V] 

and with the sequential fusion m1 @o.2 (m2 @o.2 m3) 

Bely(23) (0), Plic23) ()] = [0, 0] 

Bel, (23) A), Ply(23) (A)] = (0.2695, 0.7083] 

Beli (23) B), Pli(23)(B)] _ (0.2917, 0.7305] 

Bel, 23) A U B), Pl4(23) (A U B)| = [1, 1] 


Based on these results and the decision-making method 
used by Kenn et al. (see section 3.4 of [1]) it is clear that 
no decision for A or for B can be made using the sequen- 
tial fusion (m1 @o.2 m2) Bo.2 m3 because we have neither 


Bel(iy)3(A) > Plii2)3(B), nor Beli2)3(B) > Pliizy3(A). 
Similarly, no decision can be drawn for A or for B from 
the result of the sequential fusion m1 @o,.2 (m2 @o.2 m3) 
because we have neither Bely(23)(A) > Pli(23)(B), nor 
Bely(23)(B) > Ply(23)(A). In fact we just could always take 
as final decision based on Kenn’s decision-making method 
the whole frame of discernment because one always has 
(Bel12)3(AUB) =1)> (Pl12)3(0) = 0) and (Bel; (23) (AU 
B) = 1) > (Pli23)(0) = 0) but such type of decision 
is obviously not useful at all for the applications because 
it does not help to make a clear choice between A and B. 
So, the decision-making method used by Kenn et al. does not 
work for all cases of BBA distributions as shown in this very 
simple example, and that is why it is not judicious and not 
recommended for applications. 


V. CONCLUSION 


The consequence of non-associativity of the method pre- 
sented by Kenn et al. in [1] can have a strong impact on 
the results and on decision-making in general if the KRC is 
applied sequentially for information fusion as it is proposed 
by the authors in their paper (see formula (9) page 7 of [1]). 
Because of this problem, we have a serious concern about the 
interest and the effectiveness of the method presented by Kenn 
et al.. We warn the potential users of this approach about the 
high risk of wrong decisions (when they are possible which is 
not always the case as shown in our counter-example) based on 
this method. This could have dramatical therapy consequences. 
If the authors want to use this KRC-based approach we think 
they should at least better consider a global information fusion 
processing than a sequential one, and they should adopt a 
better decision-making strategy. They also should compare 
their results with other advanced rules of combination and 
use the same decision strategy to make comparisons to show 
the real advantages of this approach, if any. The measure 
of the performances of the method with real open data sets 
for breast cancer therapy application and ground truth is also 
recommended. 
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Abstract—Human Activity Recognition (HAR) based on wear- 
able device has become a hot topic of research due to its wide 
range of applications in health-care, fitness and smart homes. 
However, the classification of some activities with similar sensor 
readings, such as standing and sitting, is usually more challeng- 
ing for the design of efficient activity recognition algorithms. 
Considering the inconsistent performance of different classifiers, 
which can provide information complementary for individual 
classifier, we propose a novel multi-classifier fusion method based 
on belief functions (BFs) theory for HAR. Specifically, at first, 
four classifiers are trained using time-domain and frequency- 
domain features to obtain basic belief assignments (BBA) of 
activity, respectively. Then, three assessment criteria are utilized 
to evaluate the reliability of the classifiers and a scoring matrix 
is constructed. Next, the algorithm of Belief Function based the 
Technique for Order Preference by Similarity to Ideal Solution 
(BF-TOPSIS) is employed to calculate the weighting coefficients 
for each classifier. Finally, the discounting and Dempster’s rules 
are adopted to combine the multiple classifiers and further deci- 
sion making. Several experiments were conducted to illustrate the 
performance of the proposed method using the UCI smartphone 
dataset, and the results show that the proposed method is more 
accurate than the state-of-art methods. 

Index Terms—Belief functions theory, multiple classifiers fu- 
sion, BF-TOPSIS, human activity recognition. 


I. INTRODUCTION 


With the booming development of micro-sensor technol- 
ogy, Human Activity Recognition (HAR) based on wearable 
sensors has become one of the hot research topics [1], [2]. 
Data of daily activities can be well collected in an all- 
round and non invasive discrete manner using accelerometers, 
gyroscopes and other such portable wearable devices, so as to 
accomplish the work of assisted living and health monitoring 
while effectively protecting the privacy of users [3]. Obviously, 
it has certain advantages compared to traditional vision-based 
methods. However, the accuracy of HAR based on wearable 
devices is affected by many factors, such as the number 
and the deployment location of sensors, the complexity of 
activities [4], and so on. Due to the uncertainty, diversity 
and individual differences of activities [5], many scholars 
took the perspective of multi-sensor information fusion to 
achieve higher accuracy of HAR. For example, Dong et al. 
[6] developed the kernel density estimation models to fit 
the multi-sensor data to obtain the basic belief assignments 
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(BBAs), and then Dezert-Smarandache theory (DSmT) was 
adopted to combine the acquired BBAs. Uddin et al. [7] 
fused data from different multimodal sensors with statistical 
features of different orders and then trained a deep recurrent 
neural network (RNN) for activity recognition. Although they 
achieve good accuracy, it is still difficult to accurately identify 
some activities with high similarity of sensor readings such as 
sitting and standing. Furthermore, the reliability of activity 
recognition can be significantly compromised when sensor 
readings are missing or disturbed by noise without additional 
sensor information. 

Recently, the multi-classifier fusion has been applied in 
pattern recognition [8], information fusion [9], [10] and other 
fields, especially for classification problems in complex envi- 
ronments. Different classifiers can learn different feature in- 
formation, and multiple classifiers can provide complementary 
information compared with any individual classifier, which 
can help identify similar human activities such as sitting 
and standing. By using multi-classifier fusion, we expect the 
improvement of the classification accuracy, which brings the 
possibility of high precision HAR. On the other hand, multiple 
classifiers can be seen as multiple sources of evidence, and we 
fuse the basic belief assignments (BBAs) of the human activity 
categories output by the classifiers. 

The multi-classifier fusion usually consists of generating 
membership classifiers, applying combination rules, and make 
a decision about the positioning of the patient. Various ap- 
proaches have been proposed for membership classifier gener- 
ation, for example, using different training samples, different 
features and different types of classifiers [11]. Common clas- 
sifier fusion methods include voting method [12], naive Bayes 
[13], Dempster-Shafer (DS) rule in Dempster-Shafer theory 
(DST) [14], and so on. In the fusion process, the classifiers 
may have different reliabilities (weights) and their decision 
results may be contradicting, which inevitably brings conflict 
issues. In order to improve classification accuracy, it becomes 
particularly important to evaluate the reliability of classifiers 
before combining them. For instance, Liu et al. employed 
contextual reliability evaluation based on inner reliability and 
relative reliability concepts [10]. Dong et al. [15] took two 
classes of criteria into account to evaluate the classifiers. The 
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first class is the conflict between the classifiers and the second 
class is the imprecision of the information provided by each 
classifier. The effective evaluation of the reliability of multiple 
classifiers and their fusion is a challenging problem for HAR 
tasks. 

In this article, we propose a novel Weighted Fusion of 
Multiple Classifiers (WFMC) method for HAR based on BFs 
theory. Our main contributions are summarized as follows: 

e Four classifiers including support vector machine (SVM), 
random forest (RF), multi-layer perceptron (MLP) and lo- 
gistic regression (LR) are trained by same training dataset 
for acquiring BBAs of human activities. To improve the 
multi-classifier fusion accuracy, Belief Jensen—Shannon 
(BJS) divergence, Interval distance function and belief 
entropy are considered to measure the reliability of the 
classifiers and a scoring matrix is constructed. 

e The BF-TOPSIS! multi-criteria decision-making algo- 
rithm is employed to calculate the weighting coefficients 
for each classifier, and multiple classifiers are fused using 
discounting technique and DS rule in this work, the final 
decision is made based on the maximum belief mass of 
all involved single focal elements. 

e We evaluate the performance of our proposed method on 
the widely used UCI Smart-phone public dataset. 

The rest of this article is organized as follows. Section 
II presents the basic concepts of BFs theory, discounting 
technique and pignistic probability transformation. Section III 
provides a detailed description of the new proposed multi- 
classifier fusion strategy for HAR. Section IV presents the 
detailed experimental results and discussions. The final section 
V gives concluding remarks with some perspectives of this 
work. 


II. PRELIMINARIES 
A. Belief Functions Theory 


BFs theory (known also as DST) has been widely used in 
multi-sensor information fusion due to its ability to deal with 
uncertain and imprecise information [17]. The basic concepts 
are introduced in this section based on [14]. Let © be a finite 
set of elements denoted by 


© = {61,09, ..., On}. (1) 


The set © is called a frame of discernment (FoD), which 
consists of exhaustive and exclusive hypotheses. Information 
sources distribute mass of belief to elements of the power set 
of the FoD, denoted by 2°. For example, if @ = {01,02}, 
then 

2° = {9,01, 02, 01 U 02}. (2) 


A BBA, called a mass function, is defined by the mapping 
m(-) : 2° + [0,1] , which satisfies m(0) = 0 and 


S> m(A) = 1. (3) 


A€E22 


'BF-TOPSIS is an extension of the technique for order preference by 
similarity to ideal solution (TOPSIS) based on belief functions (BF) [16]. 


For a proposition A C 0, the belief function is defined as: 


Bel(A)= > 


BCA,BeE2? 


m(B). (4) 


The plausibility function is defined as: 


PI(A) = 
BNA#0,BE2° 


m(B). (5) 


If the focal elements of BBA are all singletons, the BBA 
is called Bayesian BBA [14]. In pattern classification, m(A) 
represents the support degree of the object associated with 
class. For example, if A is a set of classes (e.g., A = {01, 02}), 
m(A) denotes the possibility of classification among the class 
6, and 62 with respect to the object. In DST, the classical 
Dempster’s rule (also called Dempster-Shafer rule, or just DS 
rule) is used to combine two (or more”) independent Sources 
of Evidence (SoEs), which is denoted as m1, @ mz and defined 
as follows [14]: for VA € 2°, A 40, 


(m1 © m2)(A)=—— 
B,C€2°|BNC=A 


> 


B,C€2°|BNC=0 


with 


where & represents the total conflict degree. If & = 1, it implies 
that the two SoEs are in total conflict, and the DS rule cannot 
be applied because of division by zero. 


B. Classical Discounting Technique 


The SoEs may have varying degrees of reliability due to 
their different abilities of classification. The discounting oper- 
ations are frequently conducted by using a discounting factor a 
for each source of evidence. A particular discounting operation 
has been introduced by Shafer [14] for the combination of 
SoEs with different degrees of reliability. Shafer discounts 
the masses of all focal elements by a discounting (weighting) 
factor a € [0,1] to the total ignorance. Each discounted BBA 
characterizing each discounted source of evidence is used in 
the fusion process. More precisely, for VA € 2°\{O}, the 
discounted mass of discounted source of evidence is defined 
as follows: 

{ m* (A) =a-m(A) (8) 
m* (0) =1-—a+a-m(0) 


where a = | means that the SoE is completely reliable, and 
a = 0 means that the SoE is completely unreliable. 


C. Pignistic Probability Transformation 


When multi-source information is combined, there may 
be disjunctive focal elements with strictly positive mass of 
belief. It is worth noting that the final decision is made only 
among singleton focal elements. Classically, a BBA is usually 
transformed into a (possibly subjective) probability measure 


2To keep the presentation as simple as possible, we present DS rule for 
only two BBAs, see [14] for its generalization. 
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for decision making. The Pignistic Probability Transformation 
(PPT, or BetP transform) proposed by Smets in [18], [19] 
is generally considered as a reasonable in-between decisional 
attitude between the max of Bel(.) (pessimistic attitude) and 
max of Pl(.) (optimistic attitude). The betting probability 
BetP(@;) of any singleton focal element 6; of the FoD is 
defined by 

BetP (0;) = SS es (9) 
0;EX,X E22 


where |X| refers to the cardinality of a subset X. One 
clearly sees tat the BetP transform evenly distributes the belief 
assignment of disjunctive focal element to the singleton focal 
element it contains. 


III. WEIGHTED FUSION OF MULTIPLE CLASSIFIERS 
A. Classifiers for HAR 


In this article, we use classical machine learning classifiers 
[20] such as SVM, RF, MLP and LR to generate BBAs 
of human activity, and these classifiers can only give the 
mass of belief for singleton focal elements (i.e. we work 
with Bayesian BBAs). In the fusion of multiple classifiers, 
a BBA can be represented by the output of each classifier. 
It is worth noting that we should choose different types of 
classifiers as far as possible. In general, when the diversity 
between the multiple classifiers is larger, the advantages will 
be more obvious. At the same time, we need to guarantee 
the individual prediction accuracy of each classifier, which 
is the basis for the high accuracy of our WFMC algorithm. 
Furthermore, we train each classifier separately using the same 
training dataset. Once multiple classifiers are trained, we can 
obtain the corresponding BBAs for each category of human 
activity. 


B. Assessment Criteria 


After acquiring multiple BBAs of the human activity to 
be identified, we can use DS rule to fuse these BBAs and 
further make decisions. In this article we work with DS rule 
mainly because of its simplicity even we are aware of its well- 
known disputable dictatorial behavior in some cases, and that 
is why we use discounting techniques. We will evaluate the 
performances of alternative fusion rules in our future works. 
From the perspective of conflicts between multiple BBAs 
or uncertain information, the reliability of multiple BBAs 
should be evaluated before combination, its goal is to eliminate 
and reduce the negative influence of unreliable BBAs on the 
final recognition accuracy. For this reason, the appropriate 
assessment criterion need to be chosen in advance. In this 
article, we have selected three assessment criteria, described 
as follows: 

a) Divergence degree: The Belief Jensen—Shannon (BJS) 
divergence measure was presented by Xiao [21] to measure 
the divergence between belief functions in DST. It is the gen- 
eralization of the Jensen-Shannon divergence [22] where the 
probability distribution is replaced with belief mass functions. 


Let m, and mz be two BBAs on the same FoD, containing 
m mutually exclusive and exhaustive hypotheses. The BJS 
divergence between ™m, and mz is denoted as: 


: [> mata) log (aca matey) 


AG )| ibs 


where Ai a a non cHIpDy. jeiement of the power-set 2°, 
and eas i *m,(A;) = 1 Se 7 * m(A;) = 1. The lower and 
upper bounds of the BJS divergence measure are respectively 
equal to zero and one. When m, has the same BBAs as ma, 
the BJS divergence between m, and mz is 0. When two BBAs 
are completely different, the BJS divergence value is 1. In this 
article, the average BJS divergence of a BBA can be calculated 
by 


BIS (m1,m2) => 


+> mai) os ( 


N 
BJIS(m) = > 3. BIS(m, m5) (11) 


j=l 


where N indicates the number of classifiers. 

b) Distance degree: The smaller the distance between 
a pair of BBAs, the closer their belief values are, and the 
better for our decision-making. In this article, the interval 
distance [23] is an excellent metric, as it considers the belief 
intervals using the belief and plausibility functions of each 
focal element to describe the closeness between BBAs. The 
interval distance is defined as follows: 


dij (1m, m2) = => [der (Bly (Ai), Bla (Ai)))° 
- (12) 

with 
BI (A;) = [Bel (Aj) , Pl (Ai)] (13) 


dpz (a1, bi] , [a2, b2]) = 


a, + by az +b2]* | 1 by — ay bsnl 
| 2 2 | Al 2 a 
(14) 
The average interval distance of one set of BBAs can be 


calculated by 


dEe(m (15) 


soi tions 


where N indicates the number of classifiers. The larger the 
value of the interval distance, the greater the degree of conflict 
between the current BBA and other BBAs, the less reliable it 
will be, and vice versa. 

c) Uncertain degree: A novel effective measure of uncer- 
tainty (i.e. entropy) of BBAs is proposed by Dezert [24], this 
new continuous measure is effective in the sense that it satisfies 
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a small number of very natural and essential desiderata. The 
new entropy measure is defined by 


U(m) = S> s(X) (16) 
XE2° 
with 
s(X) 5 —(1 — u(X))m(X) log(m(X)) + u(X)(1 ol 
u(X) & Pl(X) — Bel(X). (18) 


s(X) is the uncertainty contribution of X in U(m). This 
measure of uncertainty coincides with Shannon entropy for any 
Bayesian BBA, it can be also interpreted as an effective gener- 
alization of Shannon entropy. We always have U(m) > 0, and 
U(m) < U(m,) if the BBA m(.) is different of the vacuous 
BBA m.,,(.) defined by m,(Q) = 1. It is worth noting that it is 
possible that a non-Bayesian BBA can have an entropy value 
U(m) smaller than the maximum of Shannon entropy given 
by log(|O]). When X is a single focal element and satisfies 
m(X) = 1, U(m) has a minimum value of 0, which indicates 
that the source of evidence is completely certain and it plays 
an important role in the final combination. 


C. Reliability Evaluation of Classifiers 


In this article, each classifier can be regarded as a evidence 
source. We obtain the reliability of one classifier by evaluating 
its output, as follows: 

a) Construction of scoring matrix: Supposing that there 
exists N classifiers over the same FoD, and their BBAs 
composition are as follows: 


| Aj Ag Am 
Ci m1 (Ai) m4 (A2) my (Am) 
C2 m2 (Ai) mz (Az) mg (Am) (19) 
on mn (Ay) mn (Ay) mN (Am) 


where A; € 2°, and C;,j = 1,2,...,N represents the jth 
classifier. Then we calculate the scores of each classifier 
according to the assessment criteria Crit, 7 = 1, 2,...,q and 
the scoring matrix S' can be generated as follows: 


| Ci Cy C; Cn 
Crit, Si S12 S15 Sin 
Critz | So S22 Soy San (20) 
Critg Sq Sq2 S45 Sqn 


In this article, g = 3, that is: Crit; 2 BJS(-), Critg 2 
dE¢(-) and Crit3 2 U(-). 

b) Construction of local BBAs for classifiers: Consid- 
ering the assessment criteria and their corresponding evalu- 
ation vectors, we can calculate the positive support degree 
Sup, (C;) and negative support degree Inf, (Cj) for each 
classifier by the following equations (see [16] for details) 


Supy (Cj) 2 Se 


rE{Ly~ N} Sine S8ing 


[Snj — Snel: (21) 


Infy (Cj) = - > (22) 


KE{1 + N}| Syn 2Snj 


[Sing _ nnl- 


: " ae ” 
Then, the maximum value C7). and minimum value C’),,,, 


of the classifier C’; under the assessment criteria C'rit,, can be 
obtained by the following equations. 


C.,, = max; Sup, (C;) (23) 
C2... = ming In fy (Cy). (24) 


Next, the construction of local BBAs is based on the method 
presented in [16] and defined as follows: 


A 
Mj—n (Cj) = Bely (Cj) 


Aa\A 
mj-n (©;) £1 Ply (C5) 25) 
A \ A 
Mj—n (Cj U Cj) = Ply (Cj) — Bely (C5) 
with 6 A 
Bel, (C;) 2 UPa ( i) 
Cinax 
—,\ A Inf, (C 
Bely (Cj) = Jn i) (26) 
A Inf (C 
Pl, (Cj) =1 fi\ i) 


where mj—, (Cj), mj—n (Cj) and mj_, (Cj UC;) respec- 
tively represent the positive support belief, negative support 
belief and uncertainty belief of the classifier C; based on the 
assessment criteria Crit). 

c) Calculation of weight factors: We employ the BF- 
TOPSIS algorithm [16] to calculate the weight factors for each 
classifier and the specific steps are as follows. 


e Step 1 Calculate the local BBAs mj_,, (Cj), Mj—n (Cj) 
and mj— (C; U Cj) of each classifier according to the 
scoring matrix. 

e Step 2 For each classifier, calculate due (ines ee) 
and d2§ (mj—n, mwers*), mbes? and mers? represent 
the best and the worst ideal BBAs based on the assess- 
ment criteria Crit, respectively, where m?es* (Cj) = 1 
anid mer" (Oj) = 1, 

¢ Step 3 Calculate the weighted average distance d°°*! (C;) 
and d”°rs' (C;) of classifier, where 

N 


A 


d’est (C5) v(Crit,) « dB5 (igs mes") (27) 


n=1 
N 

WOT Ss A Gi Cc WworTs 

ars (Cj) = Sv (Crity)-dBF (min, mv") (28) 
n=1 


where u(Crit,) represents the weight of assessment 
criteria C'rit,. In this article, v (Crit;) = v (Crite) = 
vu (Crit3) =1/3 . 
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e Step 4 The final weight of the classifier C; is defined as 
follows: 

7 qworst (Cj) 
qworst (C;) + dbest (C3) 


w (Cj) (29) 

In the proposed WFMC algorithm, when a classifier is in 
complete conflict with other classifiers, it will be supported 
to a small degree. According to the reliability evaluation 
algorithm, the classifier will receive a small weighting factor, 
which discounts the masses of all focal elements to the total 
ignorance. This reduces the total conflict between classifiers 
in the fusion process, making the total conflict in the proposed 
WEMC algorithm always less than 1, thus improving the 
reliability of the fusion results. 

After obtaining the weight factors for each classifier, mul- 
tiple classifiers can be fused using the classical (i.e. Shafer’s) 
discounting technique and DS rule, and decision can be 
made based on the maximum BetP probability value. For the 
convenience of implementation, the brief framework of the 
WEMC method is given in Fig. 1. 


Classifier-1 


Classifier-2 


y y y 


Classifier-N 


BBAs-1 BBAs-2 BBAs-N 


Construct a scoring matrix using assessment criteria 
v 
Calculate weight factors using BF-TOPSIS method 


Pie ae 
Fig. 1. The framework of WFMC method. 


As we can see in Fig. 1, the proposed WFMC algorithm 
includes four main steps: 


e Step 1 (Classifiers trained): Multiple classifiers of dif- 
ferent types are trained by the same training dataset for 
acquiring BBAs. 

Step 2 (Classifiers evaluation): For each BBA generated 

by the classifier, a reliability evaluation is performed 

using three criteria and a scoring matrix is constructed. 

Step 3 (Calculation of weight factors): The BF-TOPSIS 

algorithm is employed to calculate the weight factors for 

each classifier based on the scoring matrix. 

e Step 4 (Discounting fusion): Multiple classifiers are 
combined sequentially using the classical discounting 
technique and DS rule, the final decision can be made 
based on the maximum BetP probability. 


IV. EXPERIMENTS AND DISCUSSIONS 
A. UCI Smartphone Dataset 


In this article, the UCI Smartphone dataset is considered 
for experimental verification. In UCI Smartphone dataset, 
the experiments have been carried out with a group of 30 
volunteers within an age bracket of 19-48 years. Each person 
performed six activities (walking, walking upstairs, walking 
downstairs, sitting, standing and laying) wearing a smartphone 
on the waist. Three-axial linear acceleration and three-axial 
angular velocity at a constant rate of 50Hz were captured by 
using its embedded accelerometer and gyroscope. The sensor 
signals (accelerometer and gyroscope) were pre-processed 
by applying noise filters and then sampled in fixed-width 
sliding windows of 2.56 sec and 50 percent overlap (128 
readings/window). More descriptions of the UCI Smart-phone 
dataset can be found in [25]. 


B. Example 


In order to show how our WFMC method works, an example 
is given to illustrate its specific procedures. Firstly, the focal 
element in BFs theory can be applied to mathematically 
represent human activities. Specifically 0; = walking, 02 = 
walking upstairs, 63 4 walking downstairs, #4 = sitting, 05 4 
standing, 6, = laying. For the 480th sample data with a true 
label of (standing) in the test dataset, the corresponding BBAs 
generated by four classifiers are shown in Table I. According 
to the principle of maximum probability, it can be seen that 
SVM and RF support 0,4 (sitting) while MLP and LR support 
85 (standing), which causes trouble to make decisions. We 
utilize DS rule to combine the four classifiers and the fusion 
results have the maximum belief value of 0.567 to support 04 
(sitting), which is not what we want. 

Next, we use WFMC algorithm for testing. The scoring 
matrix is acquired based on (9), (13) and (14), as shown in 
Table II. Then, we can get the positive support and negative 
support degree of each classifier according to (17) and (18), 
which are given in Table III and Table IV. It can be seen that 
BJS(-) has the highest support for RF, while U(-) has the 
highest support for SVM, and d¢(-) supports both RF and 
MLP. After that, the derived local BBAs of each classifier can 
be also obtained using (21) shown in Table V, Table VI and 
Table VII. And then by using step 2 and step 3 in BF-TOPSIS 
algorithm, we can obtain distance d?°*’ (C;) and d's’ (Cj) 
of classifiers. The weight coefficients of each classifier can 
be further obtained based on (27), as shown in Table VIII. It 
can be seen that SVM acquires the smallest weighting factor, 
while MLP gets the largest weighting factor and RF has a 
similar weighting factor to MLP, which indicates that MLP 
has the highest reliability for the current activity. Finally, four 
classifiers are combined using DS rule (6) generalized* for 
four BBAs, and the probability values for each category of 
activity are obtained based on (9), as shown in Table IX. We 


3Because DS tule is associative, the four BBAs can also be fused sequen- 
tially and the sequential order of DS fusion does not impact the final result. 
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can see that 6; (standing) has the maximum BetP probability 
value, which is consistent with the true label. 


TABLE I 
BBAS OF THE 480TH TEST SAMPLE. 
O41 62 63 04 65 46 Result 
SVM 0.0 | 0.0 | 0.0 | 0.844 | 0.156 | 0.0 04 
RF 0.0 | 0.0 | 0.0 | 0.573 | 0.427 | 0.0 04 
MLP 0.0 | 0.0 | 0.0 | 0.349 | 0.651 | 0.0 05 
LR 0.0 | 0.0 | 0.0 | 0.251 | 0.749 | 0.0 05 
DS rule | 0.0 | 0.0 | 0.0 | 0.567 | 0.433 | 0.0 04 
TABLE II 
SCORING MATRIX OF FOUR CLASSIFIERS. 


SvM | RF | MLP| LR 
BJIS() | 0.218 | 0.043 | 0.056 | 0.088 
dB<(.) | 0.113 | 0.068 | 0.068 | 0.084 
U(-) | 0.628 | 0.984 | 0.935 | 0.8162 
TABLE III 
POSITIVE SUPPORT DEGREE Supy (- OF FOUR CLASSIFIERS. 
Supn(-) | SVM | RF | MLP | LR 
BIS() | 0.0 | 0.232 | 0.194 | 0.130 
dEe(.) | 0.0 | 0.061 | 0.061 | 0.029 
uC) | 0852 | 00 | 0.05 | 0287 
TABLE IV 
NEGATIVE SUPPORT DEGREE Infn(-) OF FOUR CLASSIFIERS. 


Infn() | SVM | RF | MLP | LR 
BJS(-) | -0466 | 0.0 | -0.013 | -0.077 
dB5(-) | -0.119 | 0.0 0.0 | -0.033 
U(-) 0.0 | -0.575 | -0.425 | -0.188 
TABLE V 


LOCAL BBAS OF FOUR CLASSIFIERS ON BJS(-). 


SVM | RF | MLP | LR 
mp ssc.) (Cj) 0.0 | 1.0 | 0.835 | 0.560 
mp ys.) (Cs) 1.0 | 0.0 | 0.027 | 0.164 
mp ssi.) (Cj U Cj) 0.0 | 0.0 | 0.138 | 0.276 


TABLE VI 
LOCAL BBAS OF FOUR CLASSIFIERS ON d2¢(.-). 


SVM | RF | MLP | LR 
MgBe (.) (C3) 0.0 | 10} 10 | 0.470 
MgBe(.) (C;) 10 | 0.0 | 0.0 | 0.273 

e(.) (Cj UC; 0.0 | 0.0 | 0.0 | 0.257 

MaBec) ( j 3) 


C. Measure of Performances 


The classical Accuracy is applied to measure the perfor- 
mance of our proposed method. The specific definitions are as 


follows: 


TPatN, 


A — 
seo ae 2, TP,+TN; + FP, + FN; 


(30) 


TABLE VII 
LOCAL BBAS OF FOUR CLASSIFIERS ON U(-). 


SVM | RF | MLP | LR 
my) (Cj) 1.0 | 0.0 | 0.059 | 0.337 
muy.) (Cy) 0.0 | 1.0 | 0.740 | 0.327 

my.) (Cy UC 5) | 0.0 | 0.0 | 0.202 | 0.336 


TABLE VIII 
WEIGHTED COEFFICIENTS OF FOUR CLASSIFIERS. 


dqoest (Cj) qworst (Cj) w C;) 

SVM 0.471 0.236 0.333 

RF 0.236 0.471 0.667 

MLP 0.238 0.506 0.680 

LR 0.334 0.461 0.580 

TABLE IX 
RESULTS OF THE WFMC METHOD. 
04 02 03 04 05, 6 e 
ee 0.0 | 0.0 | 0.0 | 0.389 | 0.548 | 0.0 | 0.063 
usion 

BetP(-) | 0.01 | 0.01 | 0.01 | 0.399 | 0.558 | 0.01 | 0.0 


where 2 denotes class index and n is the number of classes. 
TP,, TN;, FN; and FN; are respectively True Positives, True 
Negatives, False Positives and False Negatives. 


D. Experimental Results and Analysis 


According to the specific steps described in Fig. 1, we 
first train four classifiers using 7352 samples, including a 
SVM, a RF, a MLP and a LR. For the parameters of SVM, 
the sigmoid function is selected as kernel function, and the 
penalty parameter is set to 1.0. For the parameters of RF, 
the number of trees in the forest is set to 150. For the 
parameters of MLP, the number of hidden layers is set to 
300. For the parameters of LR, the penalty is set to LI. 
Default parameters are selected for the remaining parameters 
of four classifiers. In this article, features are extracted from 
raw sensor data for model training, including 11 time-domain 
and 6 frequency-domain features as shown in Table II. Then 
the trained four classifiers are employed to predict the testing 
dataset containing 2947 samples. Furthermore, we fuse the 
four classifiers using the DS rule and the proposed WFMC 
algorithm, respectively, the results are shown in the Table II, 
and the related confusion matrixs are shown in Fig. 2 and 
Fig. 3. We can find that LR has the highest accuracy among 
the individual classifier with 93.52%, which is weaker than 
the DS rule approach. It indicates that individual classifier 
has limited classification ability. Moreover, we can clearly 
see that the performance of the proposed WFMC method is 
significantly better than other mentioned method, which shows 
the effectiveness of our strategy. 

Compared to the approach of traditional DS rule, the pro- 
posed method effectively improves the recognition accuracy. 
The misclassification where sitting was incorrectly recognized 
as standing is reduced from 12.4% to 8.4% and the misclassi- 
fication where walking downstairs was incorrectly recognized 
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TABLE X 
FEATURE EXTRACTION 
Domain Features 
Mean value, Standard deviation, Median 
absolute value, Maximum, Minimum, Signal 
Time magnitude area, Average sum of the squares, 
Interquartile range, Signal entropy, 
Autoregression coefficients, Correlation 
Largest frequency component, Weighted average, 
Frequency Skewness, Kurtosis, Energy of a frequency 
interval, Angle between two vectors 
Walking 0.000 0.008 0.000 0.000 0.000 
Upstairs} 0.040 0.000 0.000 0.000 
Downstairs} 0.024 0.067 0.000 0.000 
Sitting] 0.000 0.004 0.000 0.000 
Standing] 0.000 0.000 0.000 0.068 0.000 
Laying} 0.000 0.000 0.000 0.000 0.000 
Walking Upstairs Downstairs Sitting Standing Laying 


Fig. 2. Confusion matrix on UCI smartphone dataset by DS rule. 


Walking 0.000 0.010 0.000 0.000 0.000 


Upstairs, 0.036 0.000 0.000 0.000 


0.014 0.043 0.000 0.000 


Downstairs 


Sitting; 0.000 0.006 0.000 0.000 


Standing, 0.000 0.000 0.000 0.041 


Laying) 0.000 0.000 0.000 0.000 0.000 


Walking Upstairs Downstairs Sitting Standing 


Laying 


Fig. 3. Confusion matrix on UCI smartphone dataset by the WFMC method. 


as walking upstairs is reduced from 6.7% to 4.3%. This is due 
to the fact that the three evaluation criteria we have given are 
a good measure of the conflict between multiple classifiers 
and their own uncertainty, and the BF-TOPSIS algorithm 
efficiently calculates the weight coefficients for each classifier, 
which improves the accuracy of the multi-classifier fusion. 
Furthermore, we compare with some state-of-the-art ap- 
proaches in literatures to demonstrate the superiority of our 
method, including Activity Graph Based Convolutional Neural 
Network [26], DSmT-Based Kernel Density Estimation [6], 
Sensor fusion and deep recurrent neural network-based [7], 
Two-stream Transformer Network [27], Hesitant Fuzzy Belief 


TABLE XI 
COMPARISON OF WFMC METHOD WITH TRADITIONAL METHODS ON THE 
UCI SMARTPHONE DATASET. 


Method | Accuracy | Time (s) 
SVM 91.75% 9.08 
RF 92.94% 2.06 
MLP 92.53% 1.95 
LR 93.52% 1.97 
DS rule 94.43% 26.01 
WFMC 96.20% 33.10 


Structure Based Fused Extreme Learning Machine [28]. As 
we can see in Table IV, our method outperforms these state- 
of-the-art methods in terms of accuracy. 


TABLE XII 
COMPARISON OF WFMC METHOD WITH STATE-OF-THE-ART METHODS 
ON THE UCI SMARTPHONE DATASET. 


Method Accuracy | Time(s) 
Activity Graph CNN-Based [26] 90.17% 11.34 
DSmT-Based Kernel Density Estimation [6] 93.05% 24.46 
Sensor fusion and deep RNN-based [7] 94.27% 15.95 
Two-stream Transformer Network [27] 94.12% 20.19 
Hesitant Fuzzy Belief Based ELM [28] 95.20% 23.78 
WFMC 96.20% 33.10 


In terms of time consumption, our WFMC method was 
programming in Python 3.7 with a hardware of Intel Core i7- 
8700 CPU at 3.20 GHz and 16 GB RAM. We use 2947 test 
samples and counted the total time consumed by each method. 
As can be seen, traditional machine learning algorithms have 
the advantage of being fast. As our WFMC algorithm is devel- 
oped based on DST, it inevitably increases the computational 
burden. Nevertheless, the average elapsed time per test sample 
is about 11ms, which is sufficient for practical applications. 


V. CONCLUSION 


In this article, we have proposed a novel weighted fusion 
of multiple classifiers based on belief functions theory for 
human activity recognition. Firstly, we train four classical ma- 
chine learning classifiers by using time-domain and frequency- 
domain features to obtain basic belief assignments of human 
activities. Secondly, we evaluate the outputs of four classifiers 
using three criteria and construct a scoring matrix. Thirdly, we 
use the multi-criteria BF-TOPSIS algorithm to calculate the 
weight coefficients of each classifier. Finally, we adopt a dis- 
counting technique and DS rule to combine the four classifiers, 
and make decisions thanks to the pignistic probability values. 
Several experiments have been conducted based on the UCI 
Smartphone dataset. The experimental results prove that our 
WEMC approach can significantly improve the classification 
accuracy with respect to several classical and state-of-the-art 
methods. 

In our future works, we will evaluate a better measure of 
divergence between belief functions based on a more effective 
definition of relative entropy and cross-entropy. We will also 
explore the possibility to adapt our Stable Preference Ordering 
Towards Ideal Solution (SPOTIS) rank reversal multi-criteria 
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method for HAR instead using the BF-TOPSIS method which 
is not robust to rank reversal. We will test and compare 
an other decision-making technique based on belief-interval 
distance, and work on how to reduce the complexity of multi- 
classifier fusion for HAR in order to apply it to an online real 
activity recognition system. 
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Abstract—Imbalanced data is an important research for the 
classification and there are multiple techniques to deal with this 
problem. Each method has its own advantage for solving imbal- 
anced data. To improve the classification accuracy, these strate- 
gies are combined in decision level via an appropriate way to 
fully take advantages of the complementary information among 
different methods. Thus a new method is proposed as imbalanced 
data classification based on belief functions theory (IDCBF). The 
classification result generated by different strategies (i.e., hybrid- 
sampling, over-sampling, or under-sampling) may have different 
reliabilities for query patterns. So an appropriate quality evalua- 
tion rule is created to estimate the credibility of each classification 
result based on the close neighborhoods. The revised classification 
results from different strategies are then combined by Dempster’s 
rule to reduce the ignorant information and to generate the final 
classification result. Multiple experiments are used to test the 
performance of the IDCBF method, and the results show that 
IDCBF can efficiently improve the classification accuracy with 
respect to other related methods. 


Keywords: pattern classification, belief functions, evidence 
theory, imbalanced data. 


I. INTRODUCTION 


Traditional classification methods [1]—[3] usually assume 
that each category in a dataset contains the same number of 
samples and the misclassification costs are equal. However, 
the data in the real world may have imbalanced distributions 
[4]-[7]. A class with fewer instances is known as a positive 
class or a minority class, and a class with more examples is 
called a negative class or a majority class. The minority class 
is more important than the majority class in the real world, 
and the cost of misclassification is also higher. Nowadays, im- 
balanced classification is widely used in information security 
[8] and software prediction [9]. In such a way, the imbalanced 
data classification has attracted extensive interest from many 
researchers. This paper is an extension of our works presented 
in [10] and [11]. 

The imbalanced data classification methods are divided 
into three kinds: data preprocessing level [12]-[14], feature 
selection level [15]-[17], and classification methods improve- 
ment level [18], [19]. In this work, we attempt to solve 
the problem at data preprocessing level, which decreases 
the imbalanced ratio of the dataset via creating minority 
data or deleting majority data. It focuses on under-sampling 
[20], over-sampling [21] and hybrid-sampling [22] methods 


891 


to minimize the imbalanced ratio by redistributing the data. 
In under-sampling technique, it deletes the majority data to 
increase the classification accuracy of minority classes such 
as the Nearmiss [20] method. In the over-sampling method, it 
creates the minority data by the Euclidean distance to balance 
the sample ratio such as Synthetic Minority Oversampling 
Techniques (SMOTE) [21]. The hybrid-sampling method is 
linked with under-sampling and over-sampling techniques such 
as SmoteTomek [23] method. 


These methods have their own advantages and drawbacks 
when they are utilized to deal with the imbalanced data clas- 
sification. Over-sampling methods allow to generate minority 
data but they may cause the over-fitting problems. Under- 
sampling techniques remove majority data which may discard 
potentially important information. Hybrid-sampling algorithms 
are conducted with the connection of under-sampling and 
over-sampling methods. Each technique has its own particular 
benefits. To better improve the classification accuracy, we will 
propose a new method at decision level to combine these 
three algorithms via making full use of their complementary 
information. 


Belief functions theory provides an essential decision-level 
information fusion tool, and it is able to well combine the 
uncertain information. It has been already applied in data 
fusion and pattern classification fields [24] [25]. In this paper, 
we want to propose a new method called imbalanced data 
classification based on belief functions theory (IDCBF). The 
output classification results generated by different methods 
(i.e., hybrid-sampling, over-sampling, and under-sampling) 
may have different qualities/reliabilities. A reliability matrix 
through the neighborhood of the object is proposed to make 
a refined reliability evaluation. The classification outputs by 
different techniques will be cautiously revised utilizing the 
reliability matrix. Finally, the corrected classification results 
are combined by the evidence combination rule for making 
the final decision. 


The remainder of this paper is organized as follows. Section 
II describes the proposed method in detail. The experimental 
applications are presented to test the performance of IDCBF 
in Section HI. Section IV provides the conclusions. 
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Il. IMBALANCED DATA CLASSIFICATION BASED ON BELIEF 
FUNCTIONS THEORY 


The three imbalanced data classification methods (hybrid- 
sampling, over-sampling, and under-sampling) have their own 
advantages and drawbacks. To better improve the classification 
accuracy, these three methods are combined through an appro- 
priate way for taking fully advantages of the complementary 
information among these methods. Belief functions theory also 
called as evidence theory, which provides an efficient tool to 
combine the uncertain information at decision level. Thus, 
the belief functions theory will be utilized here to combine 
these three techniques. A new method called imbalanced data 
classification based on belief functions theory (IDCBF) is pro- 
posed here to revise the classifier. We can obtain three pieces 
of classification results represented by evidence with three 
classifiers (i.e., hybrid-sampling, over-sampling, and under- 
sampling), and we will combine these classification results 
under the belief functions framework efficiently. 

The classification results of different data sampling method 
may have different reliabilities, and it may be harmful for 
the combination if the result with low reliability. So it is 
essential to evaluate the reliability of each classification output 
properly, and then revising the result based on the evaluation 
to improve the combination performance. We propose to 
estimate a refined reliability matrix to represent the qualities 
of each classification result. Such reliability matrix will be 
estimated based on the neighborhoods of objects in training 
dataset space, and it will show the possibility of the object 
misclassified to other classes. After that, the classification 
results are able to revised according to this matrix in a cautious 
way under belief functions framework. The three corrected 
classification results are combined by belief functions theory 
for predicting the class of object. 


A. Basics of belief functions theory 


Belief functions theory, also called evidence theory or 
Dempster-Shafer Theory (DST), provides an efficient tool 
to combine the uncertain information at decision level. In 
belief functions theory, the mass function m, also called the 
basic belief assignment (BBA) is defined over the frame of 
discernment denoted by Q = {w;,i = 1,2,...,c}, consist- 
ing of c exhaustive and exclusive hypotheses (classes) w;, 
i € {1,2,...,c}. The power-set 2° is composed by all the 
subsets of 2. A BBA is a mapping m(.) from 2° to [0,1] 
which satisfies m(0) = 0 and 


Y may=1, (1) 


AE22 


A is called a focal element of m(.) which satisfy m(A) > 0. 
The BBA is called Bayesian BBA if the focal elements of BBA 
are all singleton classes. In this paper, we mainly assume that 
combining the classification results in form of BBAs. 
Dempster’s rule (DS rule) is usually utilized to combine the 
multiple classification results represented by BBA. DS rule for 
the combination of two BBA as m = mj, © mz over 2® is 


defined by m(0) = 0, and VA 4 0 € 2° with the following 
formula, 


DS BncHa mi(B)m2(C) VA € 2°\ {0} 
eT ma) 


— = 1—mi2 
ae a 0, if A=0 

(2) 
where m12(0) > B,ce22|Bac=9 ™1(B)ma(C) is the total 
conjunctive conflicting masses. DS rule is associative, the 
combination results are not influenced by the combination 
order for multiple BBA. 

In reality, the classification result by different classifica- 
tion methods (hybrid-sampling, over-sampling, and under- 
sampling) may have different reliabilities. It is essential to 
evaluate the reliabilities of classification results and revised 
the results based on the evaluation before combination. 


B. Evidence reliability evaluation 


The classification results by different classifier may have 
different qualities. The under-sampling method deletes the 
majority data which may change the distribution of data to 
affect the classification accuracy. The over-sampling technique 
generates fake instance for minority class which may also 
has bad influence on classification result. The classification 
result of different methods can be seen as the evidence (BBA). 
The three classifiers (hybrid-sampling, over-sampling, and 
under-sampling) are denoted by three classifiers as C1, C2, C3 
here. The object is classified over the frame of discernment 
Q = {w1,w2,...,wWe}, and w; represents the class label. 
Assume a training set of S labelled patterns is available. 
For each classifier Ci, | € {1,2,3}, the classification result 
for the training data x;, i € {1,2,...,5}, is denoted by 
Pi = {Pi1, Pi,2,---,Pi,c}, where p;,; represents the probabi- 
lity which x; belongs to w;. The true classification result of 
training data is t;(w,;) = 1 and t;(w,) = 0, w; Aw, when the 
true label of x; is w;. Given a test pattern y, the classification 
result of y by different classifier can be shown as a BBA 
m,, | € {1,2,3}. The final label of y is calculated by the 
combination of these BBAs. 

For each classifier Ci, | € {1,2,3}, it often shows close 
performance to close neighborhoods, and the close neighbors 
of object in dataset can be used to evaluate the quality of 
each classification result [26], [27]. The classification results of 
training data are given by pj, and the true label of training data 
is also known. So the bias error of classifier can be computed 
by comparing the classifier output and the true label. Thus we 
can estimate the quality of the classification result of the y 
based on these neighbors. 

How to select the suitable neighbors is an essential rule in 
reliability evaluation of each classification result. If we seek 
the close neighbors according to the attribute data, the selected 
neighbors seem near from the object, but the classification 
result of these neighbors as p; may not close to the object of 
m,. These neighbors are not very useful to efficiently evaluate 
the quality of the classification result. 

However, if we seek the close neighbors according to the 
distance of classification results. The selected neighbors with 
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the similar classification results to the object may quite differ 
from the object in attribute data space. If so, these neighbors 
cannot provide necessary knowledge on the reliability evalua- 
tion of the classification result of the object. Thus, we propose 
a new way to select the neighbors of the object based on both 
the attribute data and the classifier output. This ensures the 
selected neighbors with close attribute and the classification 
results to the object. 

We seek N nearest neighbors of the object y using the 
attribute information at first, and the selected ones are rep- 
resented by 21,%2,...,xy. In these selected N neighbors, 
there may exist some of them whose classification results 
pi; are quite different from the classification result of the 
object as m;. These neighbors may not be beneficial and 
even harmful for the refined reliability evaluation of mj. 
Thus, we will choose K neighbors from the previous NV ones 
according to the distance of classification results between p; 
and mj). Eventually, the classification results of the K selected 
neighbors are given by pi, p2,..., Px, and the corresponding 
class label are known as t1, t2,...,tx. These K neighbors can 
provide important prior knowledge for reliability evaluation. 

Since these chosen neighbors are not totally similar to 
the object, they cannot be completely trusted during the 
reliability evaluation. The confidence factor mainly depends on 
the difference between the object and the selected neighbors, 
and both distance of attribute as well as the classifier output 
are considered to compute the difference. The beliefs in the 
classification result are divided into two parts. One will enter 
the correction process based on reliability evaluation, and the 
other will be preserved in the original result. 

The attribute data is normalized by the general linear 
normalization method as eq. (3) to make the value in [0, 1]. 

 — Sa - (3) 


ef] = 2.7 
Qmax — Amin 


where a, represents the attribute value in dimension 7, and a; 
is the normalized value. 

The confidence factor a;, 1 € {1,2,3} is computed to the 
average distance between the object and these neighbors. The 
neighbors are very similar to the object when their distance is 
small, and the confidence on the reliability evaluation is high, 
and vice versa. 


ae, (4) 
Perea lee eae 
dy = uaa Soa =p a): (5) 
k=1 k=1 
1 it. 1 S K 
TA __ A 7P __ P 


For each classifier C7, (; is a parameter used to adjust the 
influence of attribute distance and classifier output distance 
ratio on the confidence factor. d; is the average distance be- 
tween the object and its neighbors in regard to the attribute 
and the classification result. di. = ||y — x_|| represents the 
Euclidean distance between the object and the K neighbors. 


din, = ||m,— px|| represents the Euclidean distance between 
the classification result of the object and the K neighbors. d4 
is the mean value of the average distance from each training 
data to its K neighbors. d? represents the mean value of the 
average distance from the classification result for the training 
data to its K neighbors. 

If the confidence factor is high, it means that these neighbors 
are quite similar to the object, and we are likely to get 
important knowledge from these neighbors to correct the 
classification result of the object. In such case, a large amount 
of beliefs will be allowed to enter the correction process. 
However, if the confidence factor is low, it means that these 
neighbors are not quite close to the object, and we are not very 
confident of the reliability evaluation from these neighbors. So 
most beliefs will be kept in the original classification results, 
and only a few will be redistributed in the sequel. 

The beliefs on the classification results of the object are 
divided into two parts. One part will be redistributed in the 
correction process on the basis of the reliability evaluation, and 
the amount of beliefs to be redistributed ™,, is determined 
by eq. (7). The other is still preserved on each class as in the 
original classification result, and the amount of beliefs in this 
part mj. is given by eq. (8). 


Mir = aim), (7) 
Mio = (1— ai)mi. (8) 


C. Classification result correction 


The quality of the classification result of the object will be 
evaluated in a refined way based on K neighbors, and then the 
classifier output will be revised according to the evaluation. 

A reliability matrix ® reflects the information about the 
misclassification error of the object, and the element @;; is 
the probability of the object classified to w; but the ground 
truth is w;. Now we will see how to calculate the value of ¢;; 
using these KC selected neighbors. 

We have the class label t of these neighbors as training data, 
and the classification results p; of these neighbors by the given 
classifier are also known. So we can estimate the possibility 
(i.e., w,;;) of the object classified to w; when it truly belongs to 
w . It is defined by the sum of the probabilities committed to 
w; for the neighbors with the ground truth w; (Le., t4; = 1). 


K ~ 
—ddp 
w= y ee (tag), (9) 
k=1|tyj=1 
7 1 di dik 
— 1 ye ye 1 1 1 
where d;, = al min, a4, at a | is the relative distance 


between the object and the K neighbors. \; > 0 is a tuning 
parameter to control the influence of the distance here. 

In eq. (9), one can see the neighbors close to the object 
will play an essential role in the calculation of w,;;, because 
these neighbors can provide more useful prior knowledge on 
classification for the object. The neighbor far from the object 
has little influence on the reliability evaluation. Therefore, 
our proposed method is robust to the A number of selected 
neighbors to a certain extent. 
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The probability of the object classified to w; when it belongs 
to w; can be calculated by Bayesian rule, 
Wii 
oy == 


ae Wgi 


where ean ij = 1, c is the number of the classes. We can 
similarly calculate the reliability matrix for each object with 
the given classification method (i.e., hybrid-sampling, over- 
sampling, and under-sampling). 

The classification result of the object can be revised by this 
matrix. The reliability matrix provides the prior knowledge 
about the conditional probability of the object belonging to 
one class when it is classified to another class. We can 
get the belief of the object belonging to each class wy, 
j € {1,2,...,c}, as follows, 


(10) 


Mur (wj) = > bigmar (wi). (11) 
i=1 
Thus we can obtain the evidence as 
Ty (Wj) = Mio(w;) + Mir (wy) 
(12) 


= (1— ay) (wy) + > bigmar (wi). 
i=l 

In our method, each piece of evidence produced by different 

imbalanced data classification methods (i.e., hybrid-sampling, 

over-sampling, and under-sampling) can be corrected similarly 

as explained above. DS rule as eq. (2) is employed here to 

combine these updated pieces of evidence to acquire the final 
classification result, 


mf =m,@m)26mz, (13) 


where © denotes the DS combination operation. 


D. Parameter Optimization 


Our proposed method requires two tuning parameters: /, 
and A. The parameter ( is used to determine the confidence 
factor a by eq. (4). It can control the influence of distance 
on the confidence factor. The bigger (, the smaller confidence 
factor a. If 6 is too big, it will make a few beliefs entering 
the correction process, which is not efficient for improving the 
classification performance. If ( is too small, most beliefs will 
be redistributed even when the neighbors are not very close to 
the object, which may increase the risk of belief redistribution. 
The parameter \ is involved in calculating the reliability matrix 
@ by eq. (9). Because the normalization operation is used 
to calculate the reliability matrix by eq. (10), this matrix is 
usually not very sensitive to the tuning of \ to some extent. 

The optimal parameter is sought by minimizing an error 
criteria defined by the sum of distances between combined 
classifier result m/ and the true label t. In Matlab™, the 
function fmincon is used to deal with this optimization prob- 
lem, 


Ss 
,A} = arg min mf — ; € |0.5,1.5], A > 0, 
{8, A} = arg mit = Be (0.5, 1.5] 


(14) 


where ||.|| is the Euclidean distance, and S is the number of 
the training dataset. m! is the result of combining evidence 
concerning the ith training data, and t; = [tit,tia,..., tic]. 
t;; = 1 means the true label of x; is w;. 


III. EXPERIMENTAL APPLICATION 


In this section, we will test the performance of our proposed 
IDCBF method with some benchmark datasets by comparing 
with other related imbalanced data classification methods 
and information fusion methods such as Smote, Nearmiss, 
SmoteTomek, and averaging fusion (AF) [28]. 


A. Base classifier 


In our experiments, the Random Forest (RF) [29], and the 
K-nearest neighbors (KNN) [30], [31] are employed as base 
classifiers to classify the imbalanced datasets. The RF is an 
ensemble tool to build a decision tree. It creates multiple trees 
and merges them to obtain a better prediction result through 
maximum voting from a panel of independent judges. The 
KNN predicts the result by majority rule with the major class 
of its & most similar training data in the feature space. In the 
KNN, the weight of distance is set to “distance”. In all these 
base classifiers, the optimal parameter values (the number of 
trees, the maximum number of features, the minimum sample 
leaf size in RF, the distance in KNN) can be determined by 
grid search on the training data. 


B. Benchmark datasets 


Some imbalanced datasets are selected from UCI' and 
KEEL?’ dataset repository. The basic information of these data 
sets, including instance, attribute, class, majority instances, 
minority instances and imbalanced ratio, are shown in Table 
I. The imbalanced ratio is the ratio of the sample size of the 
majority data and that of the minority data, which is calculated 
as —““1 Different from other existing binary class imbalanced 
data handling methods, this work contains multi-class data sets 
with several classes as high as ten classes in the case of the 
Penbased dataset. Classes that include only one example have 
been removed from the datasets because the Smote method 
cannot generate instances in only one data. 


C. Performance evaluation metrics 


We evaluate the model sensitivity towards the minority 
class using the Area under Curve (AUC) [32] method. In our 
experiment, we use the one-vs-rest strategy [33], which is also 
widely applied in multi-classification problem. This method 
computes the average AUC values for each class against the 
rest of the other classes. Each class generates one AUC value, 
and the weight of each AUC is computed by the reference 
class’s prevalence in the data set. The large AUC means that 
the classifier has high accuracy. 


"http://archive.ics.uci.edu/ml 
*https://sci2s.ugr.es/keel/datasets.php 
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Table I 
IMBALANCED DATASET DESCRIPTION OF THE UCI AND KEEL. 


Auribute Majority instances | Minority instances imbalanced ratio 


Abalone 
Dermatology 
Ecoli 
Genus 
Pageblocks0O 
Penbased 
Pima 
Shuttle 
Thyroid 
Yeast 
Yeast4 
Yeast5 


BONS ST SS RS SN 2 Gann ON) § 


Table II 
THE AUC VALUES USING RF CLASSIFIER. 


Abalone 
Dermatology 
Ecoli 
Genus 
Pageblocks0O 
Penbased 
Pima 
Shuttle 
Thyroid 
Yeast 
Yeast4 
Yeast5 


33.48 


Table II 
THE AUC VALUES USING KNN CLASSIFIER. 


IDCBF 


Dermatology 
Ecoli 
Genus 
Pageblocks0 
Penbased 
Pima 
Shuttle 
Thyroid 
Yeast 
Yeast4 
Yeast5 


verage T4.4 80. 51 79. 5 83. 30 78. 31 84.70 
A a. 4 4 a. 86. 00 


D. Experimental results and evaluation 


The AUC values of imbalanced datasets using different 
classifiers are summarized in Tables I-III. The maximum 
AUC value is marked in boldface type. Imbalanced data clas- 
sification based on belief functions theory (IDCBF) method 
generally produces higher accuracy than single data sampling 
methods. This indicates that the complementary information 
among different techniques is very useful for improving clas- 
sification performance. We can also find that the proposed 
IDCBF method typically yields the highest accuracy compar- 
ing with the other combination methods. In IDCBF method, 
the reliability is evaluated based on the close neighbors in a 
refined way, and then the classifier output is cautiously revised 


to improve the quality. Moreover, the involved parameter in 
IDCBF is automatically optimized by minimizing an error 
criteria. Thus, IDCBF is able to produce the best classification 
accuracy in general. 


IV. CONCLUSION 


In this paper, we have proposed a new method for combina- 
tion of classifiers to solve the imbalanced data classification. 
Imbalanced data classification based on belief functions the- 
ory (IDCBF) method is able to take advantage of essential 
complementary information among different data sampling 
techniques to improve classification performance. Multiple 
imbalanced datasets are used to validate the performance of 
the proposed method. The experimental results show that 
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IDCBF method is able to improve classification result 


comparing with other data sampling techniques and fusion 
methods. 
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Abstract—The self-driving cars face important challenges in 
the applications of perception tasks. The driving surrounding 
areas are often chaotic and the conditions of the weather vary 
significantly. In terms of sensors, the capacities have increased, 
raising interest in big data fields such as artificial intelligence. 
Alongside, fusion techniques allow accurate coupling of informa- 
tion from different sources. Neural networks have proved good 
performance, but limitations in complex situations still occur. 
In the decision-making world, evidence theory increases the 
robustness of decisions and handles conflict management. In this 
research, a deep learning architecture based on a camera-lidar 
cross-fusion technique is proposed to achieve semantic segmenta- 
tion capabilities. The model is coupled with evidence theory and 
serves for semantic segmentation tasks. The architecture has the 
decision-making part relying on the decision belief interval and 
the information that can be represented through belief functions. 
The evidence theory is versatile and contributes to understand 
better imprecise data and to achieve more efficient predictions. 
The KITTI dataset is used in this work. The results highlight the 
interest in integrating belief theory functions into deep learning 
architecture fusing information from two heterogeneous sensors. 


Keywords: intelligent vehicles, environment perception, evi- 
dence theory, belief functions, deep learning. 


I. INTRODUCTION 


A. Self-driving cars 


Autonomous cars grasp big improvements thanks to ap- 
proaches based on neural networks. The goal of self-driving 
cars is to offer safe driving and efficiency, minimizing the rou- 
tine tasks of humans and putting forward better transportation. 
To facilitate quicker levels of autonomy, the cars are equipped 
with various sensors like cameras, point cloud devices, and 
more. In this way, perception can benefit from multi-modal 
sensor fusion from different sources of information to increase 
the robustness of the decisions. Alongside with, self-driving 
cars rely on both reference generation (path and trajectory 
planning) and control theory techniques [1]. The intelligent 
vehicles provide therefore localization and environment un- 
derstanding about the object and traffic participants, so that, 
the navigable area can be projected and followed using control 
algorithms. 


B. Motivation 


This paper focuses on the first hierarchical step of au- 
tonomous driving: perception, and particularly surrounding 
environment perception. In scene analysis, there are various 
particularities to be considered. For example, on the path 
planning side, the interval distance between a car and a 
sidewalk has a different sensibility than the interval distance 
between two cars. In the risk analysis state of the art, rules 
for safe minimal distance are referred to. These protocols 
differ with respect to the situation, for instance, at least 1 
meter distance between two cars longitudinally on the lane is 
required. However, there is no specified rule related to the tol- 
erated lateral distance between a car and a sidewalk, where the 
environment is prone to involve pedestrians, or just the interval 
between two cars, where there is a risk of crashing as well 
[2]. Within these situations, imprecise information from the 
perception system could result in inaccurate control actions. 
Consequently, decision-making approaches that can represent 
efficient information are required, so that, the perception and 
path-planning control chain are well-assured. 

Neural networks approaches, particularly prominent deep 
learning methods [3]-[5] have been slightly designed for 
environment perception features to bring value in detection, 
classification, and segmentation tasks. Besides neural net- 
works, fusion methods broadcast meaningful information from 
various sensors to empower the robustness of decisions. 

The benefits introduced by deep learning facilitate percep- 
tion tasks in the environment and help understand the self- 
driving car needs. However, since the field is sensitive to 
the quality of decisions, confidence in decision-making is 
required. A relevant way to both represent and trust infor- 
mation is to use the belief theory. This approach is a well- 
known framework utilized in the world of probabilities and 
information reasoning. 


C. Belief functions theory 


The theory of evidence, also known as Belief Functions 
(BF) theory or Dempster-Shafer Theory (DST), was proposed 
by Shafer in 1976 [6] based on previous works of Dempster 
[7]. It represents evidence elements (i.e. beliefs) for uncertain 
models. The evidence theory key features are the following: 
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generality (it extends both propositional logic and probabilistic 
reasoning), operationality (works with elementary pieces of 
evidence coupled with Dempster-Shafer’s rule’ of combina- 
tion), and scalability (evidential reasoning can be addressed to 
more complicated problems in terms of decisions and uncer- 
tainties from different sources of information), making it richer 
than the theory of probabilities for dealing with epistemic 
uncertainty. In autonomous driving tasks, such as obstacle 
avoidance, BF proves to deliver accurate performances, for 
example, the occupancy grid map of a LiDAR sensor by 
expressing conflict in more representative ways [8]. In the 
scope of pedestrian detector tasks, evidential combination rules 
fetched considerable performances over the Bayesian approach 
[9]. 

Moreover, in multi-model perception, evidential theory han- 
dles missing information, imprecision, and ignorance. In [10], 
KITTI semantic segmentation images from various sensors, 
cameras, and different layers of LiDAR are embodied together, 
which allows for enlarging the object classes or the number 
of sensors. The approach seeks to improve performances for 
a better understanding of the drivable area. 

In the deep-learning field, Cappelier et al. [11] propose 
a neural network architecture based on MLP (Multi-Layer 
Perceptron) to classify arbitrary LiDAR objects for perception. 
Their model replaces the probabilistic output with an evi- 
dential inference method, inspired by the generalized logistic 
classifier of Denceux [12]. 

Thus, frameworks based on belief functions evince promis- 
ing results in perception systems for both road segmentation 
and multi-object detection tasks, which are the main topics 
addressed in this work. The main goal is then to provide an 
evidential deep-learning model that fuses information from 
different sensors to achieve autonomous driving capabilities. 

The approaches based on belief functions delivered sig- 
nificant improvements for different axes of research such as 
decision-making, conflict management, and fusion [13]. The 
partial or total ignorance produced by the evidence theory are 
very appealing to model uncertainties. However, combining 
efficiently several independent belief functions and agreeing 
on a final decision from a belief function are challenging tasks, 
especially the decision-making under uncertainties for defense 
and security applications, like autonomous vehicles. In [14], 
Dezert et al. proposed a new decision-making method based 
on a belief interval distance that helps the decision-making 
process under uncertainty. This decision-making approach 
provides a judgment by selecting the best focal element (i.e. 
object class) for which the minimal distance with respect to 
the piece of evidence under concern is obtained. This decision- 
making approach also provides the calculation of a quality (or 
confidence factor) characterizing the quality of decision (i.e. 
the final judgment) for a future action. 


II. BASICS OF BELIEF FUNCTIONS 


Evidence theory is a formalism for reasoning and making a 
decision with uncertainty. The classical approach of evidence 


‘also referred as DS rule in the literature. 


theory is based on Dempster-Shafer rule of combination. A 
more detailed discussion can be found in [6], [15], which is 
adopted in this work. 

Let 0 = {61,62,03,...,9n} be a finite set of mutually 
exclusive elements, called the frame of discernment (FoD), 
and the mutually exclusive elements of single cardinality are 
called singletons. A basic belief assignment (BBA), or mass 
function m(-), is a mapping m : 2° — [0,1] such that: 


() 
(2) 


xco 


The quantity m(X ), known as the mass of element (i.e. subset) 
X of O, measures the belief that one commits exactly to X; 
and (1) indicates closed world assumption. The subset X is 
called a focal element of m(-) if and only if m(X) > 0. 

Given a BBA m(-), two concepts can be defined, a belief 
function (Bel) and a plausibility function (Pl) using the 
following expressions: 


Bel(X) = S~ m(Y) (3) 
YCX 
PIX = m(Y) =1-— Bel(X) (4) 
YNX#ZO 


Bel(X) can be interpreted as the degree of total support to 
X, whereas Pl(X) is the degree one fails to doubt X. 

If the BBA m/(-) is only focalized on the whole set 0, i.e. 
m(0) = 1, the BBA m(-) is called the vacuous BBA, which 
models the total ignorance. 

In DST, two BBAs m, and mg representing independent 
pieces of evidence are combined by Dempster’s rule defined 
by: 


1 


(m1 ® m2)(X) = l_kK 


S> mi(¥)m2(Z) 


YNZ=X 


(5) 


For all X C O, X #9, and (m, G m2)(0) = 0. The degree 
of conflict between the two BBAs, denoted by K, is given by: 


K& S° mi(¥)m2(Z) 


YnNZ=0 


(6) 


This DS rule of combination can be easily generalized for 
the combination of more than two sources of evidence. DS 
rule is commutative and associative which is very appealing 
for its implementation because the fusion of several sources 
can be done sequentially and the sequential fusion order does 
not matter. In the vehicle perception applications developed 
so far in the IRIMAS lab, the DS rule produces generally 
good outcomes, but because the DS rule is not always exempt 
from leading to some decision issues due to its dictatorial 
behavior in some cases [16], [17], alternative research works 
and comparative analysis with other fusion rules are also under 
consideration. 
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Figure 1: Evidential Lite-CF Archi 


III. PROPOSED METHOD FOR AUTOMATIC PERCEPTION 
A. Evidential Cross-Fusion Architecture 


In a previous work, an innovative camera-lidar fusion 
method known as Lite Cross-fusion was proposed, featuring 
a fully convolutional neural network for road recognition as 
detailed in [18]. This cross-fusion architecture exhibited sig- 
nificant results compared with early or late fusion approaches. 
Consequently, the network was subsequently integrated in 
another work [19], ensuing a reduction in the computational 
complexity, by 15%. The architecture is founded upon an 
encoder-decoder model that leverages dilated convolution for 
strengthening image resolution conservation. This architecture 
represents the baseline of this work in terms of neural networks 
and multimodal fusion. Regarding evidence theory, [11], [20] 
propose the evidential classifier for classification tasks. The 
decision-making approach is based on the distance to pro- 
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totypes method as a substitute for the conventional softmax 
decision layer. 


Taking into account the two previous methods, specifically 
the cross-fusion road detection (Lite-CF) and the architecture 
based on evidential classifiers, this work, introduces a combi- 
nation of these two methodologies. The resulting architecture, 
referred to as Lite CF-Evi, combines the strengths of the 
two frameworks designed for semantic segmentation tasks. An 
overview of the complete architecture for evidential Lite-CF 
is illustrated in Fig. |. 


The standard neural network architecture produces prob- 
ability distributions from logits using a softmax layer. The 
evidential Lite-CF produces mass functions (BBAs) rather than 
probabilities to represent the prediction of imprecise data. It 
has an encoder-decoder-based network, evidential formulation 
layer, and a decision-making unit. The encoding section has 
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two processing pipelines of 13 layers each, one for feeding 
LiDAR input and the other for the camera frames. At each 
layer level, information from one modality is combined with 
the corresponding layer from the other modality through a 
trainable weighted sum operation (*b; and *a,; respectively, 
where 7 is the layer number). These fusion weights are 
adaptable, allowing the fusion’s position and its extent to be 
fixed by the data. 

Once the LiDAR and camera inputs are transformed into 
Basic Belief Assignments (BBAs) within the evidential for- 
mulation layer, decisions can be rendered concerning specific 
elements within the power set’ denoted by 2°. In the seman- 
tic segmentation task, this power set encompasses elements 
such as “road”, “vehicle”, and “background” elements in the 
probabilistic version, and additionally “unknown” area in the 
evidential formulation. Consequently, the evidential approach 
enables having an uncertain prediction represented by the 
disjunction of single classes and interpreted as an “unknown” 
class. 


B. Decision-Making. Distance to Prototypes 


The evidential formulation layer uses as its input, the feature 
maps generated by the decoding section. When the decoder 
reaches its maximum resolution, BBAs are generated by 
computing the distances between their corresponding feature 
maps (i.e., L18 in Fig. |) and the propagated prototypes, which 
are learned automatically. The technique is called distance to 
prototypes. An illustration of the approach is shown in Fig. 2. 


Figure 2: Distance to Protoypes [21]. 


The distance to prototypes methodology can be described 
in three steps procedure as it follows (more details in [21]): 


Step 1: Distance to prototype: Let x be a feature vector 
representing features of a pixel to be classified 
possibly as class; (0,) or class (02) (ie., the 


?The power set of © is the set of all the subsets of ©, the empty set @ and 
© included. 


FoD © = {61,62}). The Euclidean distance d’ is 
determined between x and each prototype p’: 


d=|ja—p'|| i=1,---,n. (7) 


Step 2: Establish the correspondence of mass functions to 
prototypes and their interference: Each prototype 
p* has a degree of membership ul to each class 
6;, with a constraint ui + wu, = 1. Using the class 
membership us and the distance d’, a BBA 7m’ is 
constructed as: 


m'({0;}) = a'uig'(a’), 
m'(Q) =1-a'di(d’), 
ensuring that the cumulative mass sum equals 1, as 
indicated in the subsequent formula: 


;=1,2 
i (8) 


2 
S> mi(X) = Yomi ({8;}) +m'(0) =1 9) 
XCQ j=l 
where, in the expression (8), 0 < a’ < 1, and the 
decreasing function ¢’ are defined as: 


$'(d’) = exp(—7'(a')”), 
Combination: The BBAs from step 2 are combined 
using Dempster’s rule (5). The resulting combined 


BBA represents the evidence to make a decision on 
the pixel class. 


y>0 (0) 


Step 3: 


The parameters associated with the prototype p* (i.e., a’, u’,, 
and +), are incorporated into the evidential deep learning- 
based architectures as weighting factors. 

However, the learnable weights are not inherently restricted. 
Therefore, they are redefined and implemented in terms of 
some real number valued variables 7’, €', and 3%: 


v= (i)? (11) 
a 1 

“TH exp{-6} a 
i)2 

us = _ By +e (13) 


Cy +e) 


Equation (13) is slightly modified from the expression given 
in [21]. To avoid the membership values ul from becoming 
zero, a small positive term denoted as € is introduced. This 
precautionary measure handles conflict limitations that could 
occur in Dempster’s total conflict (i.e. to prevent the case with 
K=1). 

Following the previous steps and the Fig. 2, a simple case 
with only two classes can be exemplified. A frame of discern- 
ment such as the following FoD: © = {R,V} (where R stands 
for road, V for vehicle) is considered. Thus, for feature vector 
x the neural network model outputs some pixel characteristic 
values corresponding to road and vehicle. Firstly in level 1 
(L1) distances to prototypes (p;) are calculated. In this case, 
the number of prototypes has to be defined and is tunable. 
Moreover, it influences directly the model’s complexity and 
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Figure 3: Evidential Formulation 


it has to be at least equal to the number of classes. The 
more prototypes are injected, the more complex the model 
is. Continuously, the example with 3 steps approaches is 
reproduced as: 

e Step | (L1): Distance to prototype 
p' = [-15 -—2.5 —3.5|7, 71 = 0.1, at = 0.5 and 
p= 221531", 47 —01, 2 =05 
Then expressing equation (7), the following distances are 
computed: 

d' = 5.10 and d? = 3.70 
Continuously, with respect to expression (10), the activa- 
tions s’ have the following values: 
s! = 0.04 and s? = 0.13 
Once the Euclidean distances based on prototypes and their 
activations are set, within the level 2, mass functions are 
constructed: 

e Step 2 (L2): BBA construction 
Classes {R, V} have the membership degrees (13) u’: 
ui = [0.8 0.2]7 and u* = [0.3 0.7]? 

The BBAs m’ are computed following expression (8): 

m! : m+(R) = 0.03, m1(V) = 0.01, m1(@) = 0.96 

m? :m?(R) = 0.04, m?(V) = 0.09, m?(@) = 0.87 
Continuously, for the last level, the mass functions from the 
previous step are calculated with DS rule (5): 

e Step 3 (L3): DS combination of m! with m?. Mass 
function values after computing the conjunctive rule: 
m(R) = 0.0657, m(V) = 0.096, m(O) = 0.8352, 
m(@) = 0.0031(K) 


- Decision making using BBAs. 


Finally, the BBAs after Shafer’s normalization by 1 — 
K, mps(-) = m(-)/(1— K) are: 

mps(R) = 0.0659, mps(V) = 0.0963, mps(O) 
0.8378, mpgs(0) = 0, and their total sum is 1. 

In this case, the conflict is very low, the highest value of 
the mass function is the vacuous m(OQ), meaning that model 
does not have enough evidence to support road or vehicle. 

A scheme that illustrates how the Distance to Prototypes 
approach works is presented in Fig. 3. Since this illustration 
represents the perception application, it shows two options for 
the decision-making, both evidential and probabilistic for a 
semantic segmentation task with three classes: road, vehicle, 
and background. 

The probabilistic approach relies on Bayesian formulation 
for decision-making and is based on the classical pignistic 
transformation. The evidential formulation part is based on 
interval dominance for the decision-making before the final 
prediction, but it can be substituted with another method such 
as Jousselme’s distance [22] or the Decision based on Belief 
Interval (GBI) method [14]. The latter one represents the main 
baseline for the contribution of this work. The method is 
implemented and applied to semantic segmentation tasks to 
help the decision-making part of the cross-fused evidential 
deep learning model. 


C. Decision based on Belief Interval 

Han et al. present different decision-making using belief 
functions in [23], and they propose an Euclidean belief interval 
distance dg7(m,, mz) between two mass functions m,(-) and 
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xe 2° m(R) | m(V) | m3(RUV) | m(B) | m(RUB) | m(VUB) | m(RUVUB) 
mi) 0.83 0.17 0 0 0 0 0 
dp, (mi,mx) 0.1700 | 0.8300 0.4384 0.9268 0.5265 0.779 0.5991 
apr (mi,mx) | 0.1700 - - - = : = 
q(x) 0.9118 | 0.5692 0 0.5190 0 0 0 


Table I: Example decision belief interval and quality of decision q(X ). 


mMg(-) represented on the power set for the aforementioned 
FoD 0 = {61, 62, 83, ...,9n}. This dg, distance is defined by: 


dpr(m1,mz) + Ne: > diy (BI (X), Bla(X)) (14) 
XE2° 


N. = 1/2"~1 represents a normalization factor to have 
dpr (m1,m2) € (0, 1], and dy (BI, (X), Blo(X)) iS 
the Wasserstein’s distance, [24], between belief intervals 
BI,(X) 4 [Bel,(X), Pl, (X)] = [a1, by] and BIn(X) & 
[Belo(X), Plo(X)] = (az, ba]. 

Wasserstein’s distance is concisely noted as: 


dw = dw ({a1, bi] , [a2, b2]) 


and it explicitly has the following expression: 


d A ay +b, az+b2]” 1 by — ay bowie | 
9 2 ag 9 2 


(16) 

Continuously, mx represents the categorical BBA, that 

contains only X as the focal element, while X is the final 

decision. The final decision is characterized by the minimum 

between the mass functions m(-) and mx, X € 2° \{O}. X 
is defined as: 


(15) 


min 


X =arg 
XE2°\{O} 


dpi (m,mx) (17) 

where dgr(m,mx) is calculated according to expression 
(14). Here, m(-) represents the mass functions under consider- 
ation, and mx (-) is the categorical BBA focused on a chosen 
focal element X. 

The decision-making methodology is scalable implying the 
advantages of the model to consider a decision among all the 
power-set 2° possibilities. Consequently, decision space can 
encapsulate more elements than the singletons of the frame 
of discernment, including unions (i.e. disjunctions) of these 
(singleton,U singleton2). More details can be found in [14]. 
Alongside the final decision X, the method proposed in [14] 
is also able to assess how good (or trustable) the decision x 
is, considering other alternatives. Thus, the quality indicator 
(confidence factor) q(X) is defined by: 


dgr (m,mx) 


> x€20\ ¢9} dar (m,mx) 


q(X) =1 (18) 


Undoubtedly, the value of the quality indicator g(X) will 
be bigger (and close to 1) when the model is more confident 
in making the decision x. 

Once the BBAs representing the pieces of evidence in the 
corresponding pixels are evaluated, a final task remains to 


decide the classes from pixels. Therefore, given the previous 
statement (17) as presented in [14], the decision rule can be 
expressed by three classes for the semantic segmentation task 
as following: 

Case i) The decision is constrained to singletons: The pos- 
sible judgment elements are 0; (road), 62 (vehicle) 
and #3 (background). In this situation, the expres- 
sion (17) becomes: 

min 


xX et 
ars x e{RV.B} 


dgi(m,mx) (19) 
Case ii) The decision is not constrained: It might be in- 
teresting to allow the assignment of ambiguous 
pixels to imprecise classes like O. This can reduce 
classification errors by avoiding decisions that have 

more of an arbitrary nature. 

Recalling the example of distance to prototypes approach 
with two classes, road, and vehicle, here an example with 
the following FoD: 0 = {R,V, B} is presented. These three 
elements are representative of the goal of this work. Therefore, 
R stands for road, V for vehicle, B for background, and the 
decision space includes all singletons, and their disjunctions, 
that is 2°\ {0} = {R, V, RUV, B, RUB, VUB, RUVUB} 
which appears in formula (17). 

In this case, and considering Fig. 3, the decision-based on 
belief interval distance method is positioned in the middle 
phase, within the BBA construction and it represents the final 
decision for the prediction. The dg, approach reflects another 
way of combining efficiently the mass functions to handle 
uncertain predictions. 

For the given example, both the decision based on interval 
and its quality indicator are explained and highlighted in the 
above Table |. In this case, the mass functions have assigned 
the following values: 
road: m(R) = 0.83, vehicle: m(V) = 0.17 
background: m(B) = 0 
unions of classes: 

- road-vehicle: m(RUV) =0 
road-background: m(RU B) = 0 

- vehicle-background: m(V U B) = 0 

- road-vehicle-background: m(RU V U B) = 0. 

Continuously, the distances based on the dg; method are 
calculated. As expected in this situation, the distance corre- 
sponding to the element R (road) is minimal (0.17), because 
R has the highest value of the mass function. Given this, the 
minimal distance will follow as the final decision. Quality 
indicators are computed accordingly. As highlighted in the 
table I, the q(X ) values show that the model is pretty confident 
when considering the element Ff, with a confidence of 91.18%. 
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In the realm of decision-making methodology, it is impor- 
tant to get accurate judgments, and to assess how efficient 
and trustable is the decision made for the application under 
concern. 

Perception systems based on neural networks are more 
sensitive in terms of making a decision and encountering 
uncertainties, therefore confidence in a judgment is crucial. 
Correspondingly, this work highlights perception applications 
using multi-sensors, fusion, and neural networks, evidence 
theory-based that provides decision-making features and as- 
sesses the quality of decisions. 


IV. EXPERIMENTAL RESULTS 


This section presents the dataset used for the applica- 
tion and the obtained results. The Lite-CF-Evi architecture 
is assessed for scene segmentation tasks against the KITTI 
semantic segmentation database. Firstly the probabilistic and 
evidential results are exemplified. Alternatively, the decision 
belief interval and quality indicators are computed for a set of 
frames. 


A. Dataset 


The semantic KITTI dataset has originally only 200 camera 
images. The dataset is similar to KITTI Stereo and KITTI Flow 
2012/2015 datasets. Since the KITTI semantic has no LiDAR 
frames (like the road dataset for instance), the corresponding 
3D point-cloud points of the existing camera frames have to 
be identified in the big original KITTI raw dataset [25], which 
contains the data for all tasks. Hence, for 127 out of the 
200 camera images, LiDAR frames have been successfully 
projected and up-sampled to create dense depth images. A 
3D LiDAR point zx is mapped into a point y in the camera 
plane according to the KITTI projection P, rectification R and 
translation T’ matrices: 


y=PRT x (20) 


As the projected LiDAR scan is sparse, up-sampling is 
employed to generate a dense depth map, as depicted in Fig. 4. 
The up-sampling process is implemented following the method 
outlined in [19] and [26]. 


B. Semantic Segmentation 


After the up-sampling process, the newly constructed dense 
depth images from LiDAR are integrated into the Lite-CF- 
Evi model in parallel with the camera images to feed the two 
pipeline inputs of the architecture. 

Concerning the ground truth, the masks are simplified to 3 
classes: road (magenta), vehicle (dark blue), and background 
(blue), according to the original annotation. The road class is 
preserved, however, the vehicle class incorporates annotations 
such as car, truck, and bus of the original ground truth. In 
turn, the background class encapsulates all the other classes, 
except for the above-mentioned ones. 


Fig. 5a shows an illustration with an example of the original 
ground truth, while Fig. 5b describes the simplified ground 
truth. 


(b) LiDAR up-sampling. 


Figure 4: LiDAR pre-processing. 


(a) Original ground truth. 


(b) Simplified ground truth, 3 classes: road, vehicle, background. 


Figure 5: Dataset pre-processing. 


The dataset consists of 127 images: 114 for training and 13 
for validation. This method has been exclusively assessed us- 
ing the specially reconstructed KITTI semantic dataset, which 
includes the added LiDAR frames for the evidential cross- 
fusion architecture. To the best of the author’s knowledge, 
this dataset has not yet been examined by any other methods, 
unless their own work, since the point clouds were identified 
in the raw dataset and included. The ground-truth masks are 
one-hot encoded and class weight is applied to address the 
unbalanced data. Consecutively, the model is trained for 500 
epochs using mean squared error loss and Adam optimizer. 

To measure the performances, the model is evaluated using 
the intersection-over-union metric, denoted as JoU, in accor- 
dance with the PASCAL VOC benchmark [27]: 


_ TP 
- TP+FP+EFN 


with TP, FP, and EN, respectively, true positive, false positive, 
and false negative. 


LoU (21) 
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The Lite-CF-Evi is evaluated for 3 classes in a probabilis- 
tic manner. It can be observed that the global mean JoU, 
92.707%, in the evidential architecture, is higher than 92.384% 
for the probabilistic model. Individually over each class, the 
evidential model outperforms the probabilistic one (Table I), 
and visually the results are better for the Lite-Cf Evi. 


: oo Probab. Lite-CF | — Lite-CF Evi. 
oU 

mean IoU 92.384% 92.707% 
mean loU;oaa 92.713% 93.163% 
mean loU vehicle 87.118% 87.446% 
mean ToUpbackground 97.322% 97.513% 


Table I: Mean Intersection over Union/class. 


One interesting part of the evidential formulation is that the 
decision-making can be adapted to derive from a fixed number 
of classes (equal to the number of singletons) to the maximum 
number of acts, |2°| — 1. However, often the desired decision 
elements are considered only the singletons and the uncertain 
predictions. 

The image of Fig. 6a represents the predicted image with 
the probabilistic model. The image of Fig. 6b represents the 
predicted image with the evidential model (Lite-CF-Evi). It 
can be observed that classes road, vehicle, and background 
exhibit slightly higher accuracy in their predictions, with road 
class being notably precise. Furthermore, an additional class, 
denoted as the “unknown” (depicted in white) area, effectively 
captures pixels associated with uncertain predictions. This 
approach prevents the misclassification of uncertain pixels into 
incorrect categories, a scenario that may arise when utilizing 
a probabilistic approach. 

The “unknown” primarily manifests itself at the class 
boundaries, where the model frequently provides errors in its 
predictions. Likewise, pixels from distant objects often lack 
sufficient information, suggesting that the model encounters 
challenges in classifying them due to data uncertainty. Con- 
sequently, these pixels are classified as “unknown”, offering 
improved comprehension and demonstrating the effectiveness 
of evidential reasoning in managing uncertainties. 

The image of Fig. 6c represents an illustration, where the 
so-called partial ignorance is highlighted. That means that the 
previous class “unknown” is now distributed into the unions. 
This occurs as a result of changing the decision space. 

In the previous examples, results with predictions have been 
shown, when considering different decision space elements. 

For the considered perception task in the world of au- 
tonomous driving, the frame of discernment containing road 
(R), vehicle (V) and background (B) is considered: 

« FoD={R,V, B} 

while the decision space is represented by more elements, 
singletons, or the disjunctions (i.e. unions) of singletons: 

e decision = {R,V, RUV,B,RUB,VUB,RUVUB} 

Therefore, the scalability of Dempster-Shafer’s theory is 
highlighted: the model is flexible, such that, by changing the 
value of decision elements, it directly influences the number 
of predicted classes. In this place, classes are represented by 


(a) Probabilistic Prediction Lite-CF: road, vehicle, background. 


(b) Evidential Prediction with Lite-CF-Evi: road, vehicle, background, 
unknown (white). 


(c) Partial ignorance: singletons + unions (road U background (green), car U 
background (red), etc). 


Figure 6: Semantic segmentation results. 


elements of the non-zero power-set elements. Similar with 
the case of only one additional class unknown, in the case 
of unions, last image from the group of three, respectively 
Fig. 6c, the disjunctions occur preponderantly in the regions 
with boundaries between classes, where the model is more 
vulnerable to wrong predictions, which makes sense. 


C. Quality of Decision 

To evaluate the decision-making part, the quality of the 
decision method is pursued over the predictions. In this way, 
the confidence in making decisions can be evaluated both 
for each pixel of the image (each decision that has been 
effectuated for defining the prediction), similarly as it can be 
computed like the average decision for each class (e.g. the 
average quality of decision for the road class). 

The quality indicator recalled in equation (18) is used. 
Thus, X represents the final decision and q(X) the quality 
indicator (whenever the model decides which class should be 
predicted, based on the minimal belief interval distance as 
in expression (14). After applying the mathematical formulas, 
corresponding to the previously mentioned expressions, two 
aspects are illustrated in the next images. Two situations are 
considered: the best solution with the minimal distance and 
the second best result, with the second smallest distance. One 
is the quality decision indicator for each pixel, respectively, 
the second shows the average quality indicator for each class. 

Therefore, the first image, Fig. 7a shows the q(X) for each 
decision of the model, basically how accurate the judgment 
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Quality Indicator 1st min distance for each decision 


(0) 200 400 600 800 1000 1200 
K : . 0.94 . . 


Quality decision 


(a) q(X) for each pixel. 


Average Quality Indicator 1st min distance for each class 


1000 


1200 


0.94 
Quality decision 


(b) Average g(X) for each class. 


Figure 7: Quality Indicator. 1%’ best solution. 


is for each pixel. Most of these values are close to 1, which 
represents high confidence in decision-making. In this case, 
the colorbar represents how confident the model is, the more 
green it is, the more sure about prediction is, and the opposite 
when it is red. The edges between classes and the boundaries 
are more prone to be sensitive when considering a decision. 
Therefore, the values for these decisions are slightly around 
0.94 between green and red. The second image, Fig. 7b, is 
realized in the same way, but it does illustrate the average 
quality indicator for each class. Hence, the judgment parameter 
for this distribution is reduced to 7 values for each element 
of the decision space, and this judgment value represents the 
quality indicator of each class. 

Similar to the previous images, a group of two other 
predictions are assessed against quality indicators. Likewise 
in the first set, Fig. 8a stands for the quality of each decision 
of the model, while the second image, Fig. 8b represents the 
average quality indicator for the second best solution. 


The range of confidence has slightly changed. The quality 
indicator values have decreased as well. The figures resemble 
the first set of images but with a lower confidence. In the next 
table, there is a short comparison between the two solutions. 
The table shows the average quality indicator for each element 
of the decision space in the two scenarios, when the minimum 
distance is calculated (first best solution), and when the second 
minimum belief interval distance is considered. As expected, 


Quality Indicator 2nd min distance for each decision 


100 
200 
300 
0 200 400 600 800 1000 1200 
0.875 0.880 0.885 0.890 0.895 0.900 0.905 0.910 


Quality decision 


(a) q(X) for each pixel. 


Average Quality Indicator 2nd min distance for each class 
ty) 


100 
200 
300 
0 200 400 600 800 1000 1200 
0.880 0.881 0.882 0883 0884 0.885 0.886 


Quality decision 


(b) Average q(X) for each class. 


Figure 8: Quality Indicator. 2”¢ best solution. 


the model is more confident in the first case. 


Average q(X) 1? solution 2”4 solution 

Element 

R 0.9970 0.8802 

v 0.9928 0.8816 
RUV 0.8806 0.8797 

B 0.9950 0.8807 
RUB 0.9149 0.8865 
VUB 0.9085 0.8855 
RUVUB 8 ? 


Table III: 1° best solution vs 2”@ best solution. 
Average quality indicator/class. 


The uncertain predictions are spread between unions of two 
singletons, therefore the last line of the table II for the union 
of all the singletons is equal to zero. 


V. CONCLUSION 


In this work, a camera-lidar fusion has been proposed by 
using a deep learning architecture combined with evidence 
theory for intelligent vehicle perception. The combination is 
realized at the very last level, replacing the softmax decision 
with a decision calculation by the distance to prototypes 
approach. The introduction of an unknown class as a decision 
element further improves efficiency. Hence, distant points 
and ambiguous features can be categorized as “unknown” 
rather than being erroneously assigned to specific predictions. 
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The conducted experiments show different results depending 
on the chosen decision space that can include unions of 
singletons (i.e. disjunctions). For the decision-making part, 
the application adopts the decision based on the belief interval 
distance which avoids a loss of information with respect to the 
classical maximum of pignistic probability decision-making 
approach. Moreover, the quality of predictions are evaluated 
thanks to the quality-decision indicator (i.e. confidence factor) 
presented in the results section. The results of performed 
simulation show that the obtained confidence factors are very 
high (within [0.9, 1.0] interval) for all decisions taken, which 
shows a good behavior of the decision-making method used 
for this application. As perspectives, the plan is to enhance 
the Lite CF-Evi model for various class configurations and 
more intricate tasks while maintaining the computational 
efficiency needed for real-time applications. Additionally, a 
more in-depth examination of the distribution and impact of 
“unknown” predictions is intended to be explored, as well 
as advanced rules of combination based on the proportional 
conflict redistribution principle. 
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Abstract—We introduce a novel deep neural network architec- 
ture based on Dempster-Shafer theory capable of handling large 
image datasets with numerous classes, such as ImageNet. Our 
approach involves analyzing images through multiple experts, 
composed of convolutional deep neural networks that predict 
mass functions. These experts are then merged using Dempster’s 
rule, thereby returning a set of potential classes by selecting 
the best expected utility based on the previously computed 
mass functions. Our innovative algorithm can identify the best 
set of classes among the 2“ possible sets for K classes while 
maintaining a complexity of O(K log(K)). To illustrate our 
approach, we apply it to an out-of-distribution example search 
problem, demonstrating its efficiency. 


Keywords: Dempster-Shafer theory, evidence theory, belief 
function, deep learning, out-of-distribution. 


I. INTRODUCTION 


In recent years, image classification has made remarkable 
strides with the advent of deep neural networks (DNNs). 
However, high ambiguity in the feature vector may lead 
to missclassification due to the fact that multiple classes 
share similar expected probabilities. Moreover, a model only 
trained for precise classification may struggle to detect out-of- 
distribution (OOD) data. 

One promising solution to this problem is set-valued clas- 
sification [I], [2]. This method allows the model to assign 
a new data to a non-empty set of classes, particularly when 
uncertainty is high and precise classification is challenging. 

In the context of Out-of-Distribution (OOD) detection, a 
prevalent approach is the utilization of a classification method 
with a reject option [3], [4], which can be seen as a special case 
of set-valued classification. Rejection is defined by assigning 
a data to the set of all possible classes, indicating a state of 
high uncertainty. 

Recently, several works have sought to integrate the 
Dempster-Shafer theory (DST) into deep neural networks, 
aiming to leverage the power of evidential reasoning [5]-[7]. 
However, these attempts have been confined to relatively small 
and well-structured datasets such as MNIST or CIFAR- 
10 [9]. The primary impediment has been the algorithmic 
complexity of DST, which scales exponentially with the size 
of the frame of discernment 2, containing 2* subsets where 
Kk =|Q|. 


Based on [10], proposed an end-to-end deep evi- 
dential neural network that allocates mass values only to 
singletons and 2. This method addresses this computational 
bottleneck, effectively reducing the spatial complexity from 
O(2*) to O(K +1) for the training phase. Nevertheless, the 
decision-making process for set-valued classification during 
the evaluation phase remains a computationally expensive task, 
requiring an exhaustive selection from all possible subsets of 
Q, still operating at O(2") complexity. Thus, they selected 
the possible subsets of 2 based on the distance between the 
classes derived from the confusion matrix. 

We propose in this work an algorithmic solution to mitigate 
the O(2) complexity, making set-valued decisions derived 
from a mass function output by a Convolutional Neural Net- 
work (CNN) feasible with linear complexity without interme- 
diate steps to restrict the number of subsets. Additionally, we 
introduce mathematical optimizations to enhance numerical 
computations, enabling scalable implementation of set-valued 
classification evidential models. These contributions pave the 
way for the application of the DST theoretical framework 
to high-dimensional real-world datasets with many classes. 
They offer significant potential for improving the reliability 
of deep learning models in various applications such as OOD 
detection. 

The remaining parts of this work are organized as follows. 
In section |II] we recall basics of Dempster-Shafer theory. In 
section [III] we present the evidential neural network architec- 
ture we use and the algorithmic solution we propose to make 
set-valued decision in linear complexity. The experiments and 
preliminary results on large datasets are presented in section 
Finally, we conclude in section [V] 


II. BELIEF THEORY 
A. Background on belief functions 


Belief function theory, called also Evidence theory or 
Dempster-Shafer theory [12], [13], is able to model and reason 
about imprecise and uncertain problems, and has more obvious 
advantages in the representation and combination of uncertain 
information. 

To represent partial knowledge in the belief function theory, 
let consider the frame of discernment Q as a finite set of 
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variables w which refers to K elementary events to a given 
problem (Q = {w1,w2,...,wK}). 

The power set of (0 is the set of all the 2" possible subsets. 
It is presented as follows: 


= {9, {ur}, ...., {we}, {wi, wo}, (wi, w3},....,Q}, (1) 


where the {w;} elements are titled as singletons and @ denotes 
the empty set. 

The key point of Dempster-Shafer theory is the basic belief 
assignment (bba) which represents the partial knowledge about 
the value of w. A bba is a function from 2° to [0,1] defined 
as follows: 

m: 2% —s [0,1] 
At m(A) 


where ™ satisfies the following constraint: 


S¢ m(A) = 1. (3) 


ACQ 


(2) 


An element A of 2 is called a focal element when 
m(A) > 0, and the set containing all these elements is called 
a body of evidence (BOE). When each element in BOE is a 
singleton, m is named a Bayesian bba. On the other hand, 
when BOE contains only 2 as a focal element, we are in the 
complete ignorance situation and m is called vacuous belief 
function. However, when it contains only one singleton of 2 
as a focal element, m is presented as a Certain mass function. 

A bba function is normalized when the mass given to 
the empty set is constrained to be zero (m(@) = 0). In that 
case, it corresponds to the closed-world assumption [13]. 
A contrary explanation is that the frame of discernment 2 
can be incomplete and the value of w can be taken outer 
Q. Accordingly, the mass of belief that is not linked to 2 
can allowed to be strictly positive (m(Q) > 0). That case 
corresponds to the open world assumption [14]. 


B. Information fusion 


The most common way to combine two bbas mj, and m2 
defined on the same frame of discernment 22 is the Dempster’s 
rule [13], denoted as @. It is defined by mpgs(@) =0 and 
VA € 2°\{0} by 

1 


1-kK 


mps(A) = (m1 @ m2)(A) = d= mi(B)m2(C) 


BNC=A 
B,CE2° 


(4) 
where « is the degree of conflict between the two sources of 
evidence defined by: 


k= S> mi(B)m2(C). 
BnC=0 
B,CeE2” 


This fusion can be seen as the normalized version of the 
conjunctive rule which is defined by: 


ma(A)= S > mi(B)ma(C). (5) 
BNC=A 
B,CeE2° 


C. Decision-making 


The most common way of making decisions with belief 
functions is to apply the pignistic transformation to 
obtain a probability vector of size kK, then the predicted class 
corresponds to the argmax of this vector. However, such a 
strategy does not allow the model to predict a set of classes. 
To this end, defines the lower and upper expected utilities 
of selecting A C 2 as follows: 


E(fa) = >) m(B) max wa,j (6) 
BOQ 7 

E(f4) = / m(B) min way (7) 
BCQ q 


where uy; € [0,1] designates the utility of the act of 
selecting A C 2 denoted as f4 when the ground truth is w;. 
The utility matrix Us)9),.% is computed following [17], 
with a parameter y € [0.5, 1] that represents the imprecision 
tolerance degree. If the true class is w,, the utility of assigning 
a sample to set A is calculated as an Ordered Weighted 
Average (OWA) aggregation of the individual utilities 
associated with each precise assignment within A as follows: 


UA,j = HA L{u;€A} (8) 


where g € R!“! is a weight vector whose elements represent 
the decision making strategy’s tolerance to imprecision, and 
lyjeay =1 if wy; € A for ACQ, and 0 otherwise. For 
example if g = (1,0,...,0), then the decision making’s 
strategy will be totally intolerant to imprecision, thus forcing 
the model to output only one class. 

Following and [19], this weight vector is obtained by 
maximizing the following entropy: 


|A| 


Ent(g = 2108 9h (9) 


|A| |A| 
subject to constraints 2,9 = =1, » oe 


gk = 0 where y is a paramnster ie recente cre tolerance 
to imprecision. An example of a utility matrix with y = 0.9 
and Q = {w ,w2,w3} is shown in Table [I] As we can see, 
the values in the utility matrix are the same according to 
the cardinality of the selected set. This means that instead 
of computing every values of the utility matrix, we only need 
to compute a value U;, for each possible cardinality of the 
subsets of . In this example, we have U; = 1, Uz = 0.9 and 
U3 = 0.8263. 
Since we have: 
min ua; = ={ Or 
ae 0 


wee 


gk = and 


if A= 


else 0) 


and 


(11) 


max u,; =U 
aca A,j |A| 


the equations (6) and can be simplified as illustrated in 
section [III-C 
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Figure 1. Architecture of an evidential deep neural network. 
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UTILITY MATRIX WITH y = 0.9 AND K = 3. 


The expected utility is then obtained using the generalized 
Hurwicz decision criterion [20], as follows: 


E(fa) = vE(fa) + (1 —v)E(fa). 


Where v € [0,1] is the pessimism index. 

When ¥ 0.5, the decision-making strategy is totally 
intolerant to imprecision so that u;; = 1 if w; = wy, else 
uwaj = O. In this sense, we can see the expected utility as 
a generalized accuracy. The other extreme strategy is totally 
tolerant, which is achieved when y = 1 where uy; = 1 if 
w; € A, else ua; = 0 so that a model that always outputs 2 
will get an expected utility of 1. 

We chose this decision-making strategy among all those 
proposed in since it is the most general form of decision 
criterion resulting from Jaffray’s axioms [21]. Moreover, the 
expression of the expected utility leads to interesting simpli- 
fications in the restricted framework where we only consider 
the singletons and 2. 


(12) 


III. SCALABLE EVIDENTIAL NEURAL NETWORK 


In this section, we present how the DST framework can be 
incorporated into a deep neural network architecture. Based 
on some assumptions on the structure of bbas, we propose 
an algorithmic solution to make set-valued decision in linear 
complexity along with mathematical optimizations for a more 
scalable implementation. 


A. Evidential deep neural network 


As depicted in Figure the proposed evidential neural 
network architecture is very similar to a probabilistic one. Our 
architecture is based on the evidential deep neural network 
architecture introduced in [11]. The main difference resides 
in the construction of the mass function. The given image 


of size (C x H x W) first passes through the backbone of a 
convolutional neural network, resulting in a feature map of 
size (C’ x 1 x 1). This feature map captures the data’s latent 
representation. 

In the work presented in [11], the construction of mass 
functions involves the use of a distance-based layer. The 
classifier is composed of p prototypes t; in R’, where P 
is the dimension of the feature map. In their method, the 
first step is to compute the distance-based support between 
the feature map x of a data and each prototype ¢;. For the 
second step, the mass function m, associated to t; is computed 
by multiplying the distance-based support s; by a weight hj; 
which characterizes the degree of membership of prototype ¢; 
to the class w;. 

Our method for constructing the mass functions is more 
computer vision oriented and is inspired by mixture of experts 
approaches [22]. Instead of considering prototypes, we con- 
sider p experts that see the feature map of a data from different 
points of view. For this purpose, the classical fully connected 
layer is replaced by a depthwise convolution with a 
kernel of size (1 x 1) and p groups. For a given feature map 
and a given number of experts p, the depthwise convolution 
will output a matrix of size (p x (K + 1)), namely one mass 
function per expert. Each mass function holds |Q| + 1 values, 
with one value dedicated to each singleton and an another 
one for the entire set 2. This vector is then reshaped into a 
matrix of experts of size p x (|Q| +1). We apply a softmax 
activation to satisfy the equation (3). In this matrix, the i-th 
row represents the mass function associated with expert p;. 
The bbas of this matrix are then fused with Dempster’s rule 
to obtain a final bba of size |Q| + 1 which we will present in 
the next section. 

B. Computational optimization of Dempster’s rule 


As seen in the previous section, since our network is only 
considering the masses assigned to singletons and Q, the 
expression of the conjunctive rule simplifies to formula 
VAE 


ma(A)= S > mi(B)m2(C) 


BNC=A 
B,CE2° 


= m1(A)m2(A) + mi(A)ma(Q) + m1 (Q)m2(A) 
(13) 
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This brings us to an iterative algorithm for performing 
Dempster’s rule as shown by the Algorithm We define 
fy = my and pi41 = Mp(ti,m;) where ju; represents the 
mass function obtained by the fusion of the 7 first expert’s 
mass functions by the conjunctive rule. 


Algorithm 1 Iterative Dempster’s rule 


Require: p mass functions m1, ..., Mp 
My ~~ M41 
for i = 2,...,p do 


for j =1,...,K do 
Mi({w;}) = mi-1({wy }) mi ({w; }) 
+ pi—1({w3 } mi (Q) + pi-1 (Q)mj ({w }) 
end for 
Hi(Q) = wi-1(Q)m;(Q) 
end for 


return [i,)/Z 
where Z is a normalization term. 


The expression of j1;({w,;}) can be rewriten as follows: 
Ma({erg}) = Maa {erg} es ({eg }) + oa—1 (Lg } rma (Q) 
+ Hi-1(Q)mi ({o5}) 
= (Hi-1 {ey }) + waa (Q)) x (ra ({wg}) + ma(Q)) 
— pi-1(Q)m;(Q) 
(14) 


which leads to an improved algorithm that only iterates on 
the number of classes K as presented in the Algorithm [2] 


Algorithm 2 Scalable Dempster’s rule 
Require: p mass functions m1, ... 


Hp(2) = TT mi(Q) 
for j = ee yee ao 
tp ({oy}) = [] (ma({og}) + ma(Q)) — wp (Q) 


i=l 


» Mp 


end for 
return /1,/Z where Z is a normalization term. 


The algorithm [2] is highly parallelizable and each element 
of the loop can be calculated independently of the others, 
unlike the algorithm [1] where each element depends on the 
previous iteration. In practice, this second algorithm provides 
a very fast implementation of Dempster’s rule in the restricted 
framework chosen where we only consider singletons and 2 
as focal elements. 


C. Scalable decision making 


Since we only consider the singletons and 2 for the con- 
struction of the mass function, we can simplify the equations 
(6) and (7) as follows: 


E(fa) = >) (m({wi})ua,s) + m(Q) max uss, 


wrEQ 


(15) 


(16) 


q = m({w;})wag) + m(Q) min wasp. 
B(Sa) = D2 (rnlfor} ua.) +m(0) mi wa 

During the training phase, we want f, to be a singleton. 
That’s to say uy = 1 and uj; = 0 Vi 4 7 which can be seen 
as the classical accuracy metric. Under those hypotheses, we 
can simplify the equations and as follows: 


E( fur) m({wi}) + m(Q) 
E(fu.) = m({wit) 


leading to this simplified expression of the expected utility: 


U(fr;) = vin({wi}) + (1 — v) (m({wi ft) + m(Q)) 
= m({wi}) + (1— v)m(Q). 


This expression can be considered as a rewriting of the 
pignistic transformation in our restricted framework. Indeed, 
taking v= 1— ray in equation leads to the pignistic 
probability expression when m(A) =0 VA CQ such that 
|A| > 2. 

We propose to use the cross-entropy loss on the expected 
utilities vector for training our network: 


(17) 
(18) 


(19) 


n kK 
~S>So yin log (E(fax (72))) 


i=1 k=1 


(20) 


with n is size of training dataset, y;, is 1 if the label of 
example x; is w, and 0 otherwise. 

For decision-making during the evaluation and test phase, 
we want our network to be able to output a subset of . 
The main obstacle is the algorithmic complexity since it 
would require to compute 2'*! expected utilities to choose the 
subset that maximizes it. To solve this issue, proposes to 
compute the confusion matrix from the training set generated 
by an evidential deep neural network as explained above. 
Based on the distance between the classes, they only keep 
the classes and groups of classes that are similar enough by 
thresholding. Although in practice this strategy reduces the 
number of expected utilities to be computed, it remains in 2!“ 
in the worst case (when the result is to be attributed to the (Q 
set). Furthermore, we are not convinced that this strategy is 
sufficient to scale to databases with a large number of classes 
such as ImageNet where |Q|=1000. Moreover, it requires 
a costly intermediate step between the training phase and the 
evaluation and test phases. 

To this end, we propose a very simple and computation- 
ally efficient iterative algorithm to determine the argmax 
between all subsets of 2 without any a priori about the 
correlation between the classes nor intermediate step to restrict 
the number of subsets of ]. The first step is to compute 
the expected utilities of singletons using the equation 
and to sort them in a decreasing order. We then compare 
the higher singleton expected utility with the expected utility 
of the subset composed of the two best singletons using the 
equations (12),(15),(16) and so on until adding a new singleton 
to the subset decreases the expected utility. Let’s consider Q = 
{w1, wo, w3, wa} with E(w) > E(w) > E(w3) > i (w4). We 
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then compute E({w1,w2}) and compare it with E(w). Let’s 
suppose that E({w1, w2}) is effectively higher than E(w1), we 
now have to compute E({w1,w2,w3}). By considering that 
U({w, we }) > U({w1, we, w3}), we obtain A* = {w1, we}. If 
i(A*) > E(Q) then the model outputs A”, else it ouputs 9. 


Algorithm 3 Argmax of the Expected Utility 
Require: sorted singletons expected utilities 


E({was}) >... > E({wax})- 

A* © Wa, 

for 1 = 2,...,K do 
AG nip — {A*, Wa; } 

if E(Avemp) > E(A*) 
end if 

end for 

return A* 


E({wa}) 2 


then A* < A* 


temp 


This strategy allows the model to output a set of classes 
among all the possible subsets of Q while maintaining a 
complexity of O(K log(’)) without requiring any limitations 
on the number of subsets of (2 to compare their expected 
utilities. 


IV. EXPERIMENTS 


To demonstrate the relevance of our model, we conducted 
several experiments. Firstly, we carry out a study on the 
impact of the various parameters on our model. Secondly, we 
sought to demonstrate the ability of our model to process large 
databases containing a large number of classes and compare 
our model with a standard probabilistic model for classification 
problem. Finally, we demonstrated the superiority of our 
approach over the standard probabilistic model for an OOD 
detection task. 

In all our experiments, we assume that the backbone used 
is of type ResNext50 [25]. This applies both to our model 
and to the probabilistic models to which the comparison is 
conducted. 


A. Datasets 


We conducted our experiments using the following 3 
databases: CIFAR-100, ImageNet and SVHN dataset. 

CIFAR-100 is a database of low-resolution 28 x 28 
images. It contains 60,000 images divided into 100 classes 
with 600 images per class. 

ImageNet contains 1.5 million images of 224 x 224 
resolution, manually annotated in 1,000 categories. The an- 
notation is based on the WordNet hierarchical object catego- 
rization structure (augmented by 120 dog categories). 

The SVHN (Street View House Numbers) database isa 
collection of 32 x 32 digital images that includes handwritten 
digits from photos of house numbers taken in street scenes. 
The database contains 10 classes, corresponding to digits from 
0 to 9. 


B. Ablation study 


In this section, we present some experiments designed to 
measure the impact of the various parameters of our approach 
on its performances. We measure two metrics: expected utility 
and average cardinality. 

Given that the accuracy is obtained by fixing the imprecision 
tolerance degree y to 0.5 while computing the expected utility, 
we propose to evaluate the expected utilities across a range of 
y values from 0.5 to 0.95. 

We compute the average cardinality of the predictions 
according to 7+ as follows: 


1 
= — A(i 21 
7 AOI (21) 
where T’ = {21,...,a)7\} is the test set and A(z) is the set- 
valued output for the data x; € T’. It is clear that for y = 1, the 
model will always output fe since E(Q) = 1 and the average 
cardinality will be equal to the number of elements in 2. 
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Figure 2. Expected Utility according to the number of experts on CIFAR-100. 


Firstly, we need to determine the hyperparameters of our 
model, namely the number of experts p and the degree of 
pessimism v. Since this search process is quite time-intensive, 
we restrict it to the CIFAR-100 dataset. To identify the optimal 
number of experts, we fix v to 0.99 so that the equation{19) 
corresponds to the pignistic probability. As shown on Figure 
the impact of the number of experts does not appear to 
be significant. This is mainly because there is no guarantee 
that the experts simulated by the fully connected layer will be 
independent. So we choose p = 4 as there is no need for a lot 
of experts. Then we search for the optimal v by setting the 
number of experts p = 4. As depicted in Figure 3] the model 
learns in a similar way, independently of v. Indeed, the model 
always outputs a value very close to zero for m(Q) for precise 
classification task, so the impact of v is not significant during 
the training phase. Consequently, we have selected v = Tay 
namely v = 0.99 for CIFAR-100 and v = 0.999 for ImageNet. 
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Figure 3. Expected Utility according to v on CIFAR-100. 


C. Comparison with probabilistic approaches for image clas- 
sification 


Now that we have fixed the model hyperparameters, we can 
compare the evidential neural network with the probabilistic 
one on precision classification. As mentioned previously, the 
probabilistic model used corresponds to a ResNext50 type 
backbone. This is followed by a fully connected layer and 
a softmax. 

For fair comparison between our method and the proba- 
bilistic approach, we have to allow the probabilistic network 
to output set-valued predictions in order to compute the 
expected utility. To do so, we consider the probability vector 
output by the model as a mass function with m(Q) = 0 and 
m({w;}) = p(w;) Vj =1,...,K. 

The Expected Utility and Cardinality curves over 10 runs 
on CIFAR-100 are respectively presented in Figure |4] and 
Figure The Expected Utility and Cardinality curves on 
ImageNet are respectively presented in Figure [6] and Figure 
Due to the size of the database, we limited the ImageNet 
experiments to a single run and were therefore unable to 
calculate standard deviations. For both experiments, we can 
see that there is almost no difference between the two models 
from y = 0.5 to y = 0.7 where the decision-making strategy 
is quite intolerant to uncertainty, forcing the model to output 
one or two classes. For y = 0.75 to y = 0.95 the evidential 
model is less confident than the probabilistic one and outputs 
sets with a higher cardinality, which decreases the Expected 
Utility. On Imagenet the performance of the probabilistic 
model is 77.77% in accuracy against 77.65%. The difference 
in performance is relatively small. 


D. OOD detection 


For OOD detection task, we want to evaluate the capability 
of the network to output Q if, and only if, the data does not 
belong to the classes from the training set. For this purpose, 
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Figure 4. Expected Utility on CIFAR-100. 
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Figure 5. Average Cardinality on CIFAR-100. 


we evaluate the rate of fm by varying y from 0.5 to 0.95. A 
good model has to get a high rate of fm on out-of-distribution 
data and a low rate of fp on in-distribution data. For y = 1, 
the model will always predict 9 since all the non-zero values 
in the utility matrix will be equal to 1. So the fg rate will 
always be equal to 100%. 


The results on the OOD detection task for the models trained 
on CIFAR-100 and ImageNet are respectively presented in 
Figure[8]and Figure|9] As expected, the fe rate is very low for 
the evidential and the probabilistic models on in-distribution 
test set. However, it is clear that the evidential network 
outperforms the probabilistic network for OOD detection task 
when we evaluate them on the SVHN dataset. 
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Figure 6. Expected Utility on ImageNet. 
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Figure 7. Average Cardinality on ImageNet. 


V. DISCUSSIONS AND CONCLUSIONS 


In this work, we have presented a novel deep neural 
network based on Dempster-Shafer theory capable of handling 
large datasets for image classification. Furthermore, we have 
introduced mathematical optimizations to improve numerical 
computations, facilitating a scalable implementation of eviden- 
tial models for set-valued classification. This approach makes 
it possible to obtain results on databases with a large number 
of classes, while avoiding the problem of traversing the 2* 
subset of possible classes. 

The proposed evidential neural network shows similar re- 
sults to the probabilistic one for precise classification task. 
One way to improve it can be to ensure the independence of 
the experts with a Deep Ensemble approach [28], [29]. 

However, our network clearly outperforms the probabilistic 
one for OOD detection task regarding the fm rate. This 
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Figure 8. fg rate for OOD detection, CIFAR-100. 
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Figure 9. fg rate for OOD detection, ImageNet. 


illustrates that the proposed method overcomes one of the 
main problems of neural networks, namely the overconfidence 
even if the data is out-of-distribution. Of course, the scope of 
our method does not limit itself to image classification. We 
can adapt it to other computer vision tasks such as semantic 
segmentation and instance segmentation. 

Another way of improving our method would be to also take 
into account the partial ignorance of the experts when fusing 
the mass functions and making a decision. This would require 
to overcome computational bottlenecks but would open the 
doors for other decision-making strategies and more optimal 
fusion rules. 
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This fifth volume on Advances and Applications of DSmT for Information Fusion 
collects theoretical and applied contributions of researchers working in different 
fields of applications and in mathematics, and is available in open-access. The 
collected contributions of this volume have either been published or presented after 
disseminating the fourth volume in 2015 (available at fs.unm.edu/DSmT-book4.pdf 
or www.onera.fr/sites/default/files/297/2015-DSmT-Book4.pdf) in international 
conferences, seminars, workshops and journals, or they are new. The contributions 
of each part of this volume are chronologically ordered. 

First Part of this book presents some theoretical advances on DSmT, dealing 
mainly with modified Proportional Conflict Redistribution Rules (PCR) of 
combination with degree of intersection, coarsening techniques, interval calculus 
for PCR thanks to set inversion via interval analysis (SIVIA), rough set classifiers, 
canonical decomposition of dichotomous belief functions, fast PCR fusion, fast 
inter-criteria analysis with PCR, and improved PCRS and PCR6 rules preserving 
the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources 
of evidence with their Matlab codes. 

Because more applications of DSmT have emerged in the past years since the 
apparition of the fourth book of DSmT in 2015, the second part of this volume is 
about selected applications of DSmT mainly in building change detection, object 
recognition, quality of data association in tracking, perception in robotics, risk 
assessment for torrent protection and multi-criteria decision-making, multi-modal 
image fusion, coarsening techniques, recommender system, levee characterization 
and assessment, human heading perception, trust assessment, robotics, biometrics, 
failure detection, GPS systems, inter-criteria analysis, group decision, human 
activity recognition, storm prediction, data association for autonomous vehicles, 
identification of maritime vessels, fusion of support vector machines (SVM), Silx- 
Furtif RUST code library for information fusion including PCR rules, and network 
for ship classification. 

Finally, the third part presents interesting contributions related to belief functions 
in general published or presented along the years since 2015. These contributions 
are related with decision-making under uncertainty, belief approximations, 
probability transformations, new distances between belief functions, non-classical 
multi-criteria decision-making problems with belief functions, generalization of 
Bayes theorem, image processing, data association, entropy and cross-entropy 
measures, fuzzy evidence numbers, negator of belief mass, human activity 
recognition, information fusion for breast cancer therapy, imbalanced data 
classification, and hybrid techniques mixing deep learning with belief functions as 
well. 
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